To clarify…
On 02/07/2024 5:54 pm, Calum Mackay wrote:
> hi Petr,
>
> I noticed your LTP patch [1][2] which adjusts the nfsstat01 test on v6.9
> kernels, to account for Josef's changes [3], which restrict the NFS/RPC
> stats per-namespace.
>
> I see that Josef's changes were backported, as far back as longterm
> v5.4,
Sorry, that's not quite accurate.
Josef's NFS client changes were all backported from v6.9, as far as
longterm v5.4.y:
2057a48d0dd0 sunrpc: add a struct rpc_stats arg to rpc_create_args
d47151b79e32 nfs: expose /proc/net/sunrpc/nfs in net namespaces
1548036ef120 nfs: make the rpc_stat per net namespace
Of Josef's NFS server changes, four were backported from v6.9 to v6.8:
418b9687dece sunrpc: use the struct net as the svc proc private
d98416cc2154 nfsd: rename NFSD_NET_* to NFSD_STATS_*
93483ac5fec6 nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
4b14885411f7 nfsd: make all of the nfsd stats per-network namespace
and the others remained only in v6.9:
ab42f4d9a26f sunrpc: don't change ->sv_stats if it doesn't exist
a2214ed588fb nfsd: stop setting ->pg_stats for unused stats
f09432386766 sunrpc: pass in the sv_stats struct through svc_create_pooled
3f6ef182f144 sunrpc: remove ->pg_stats from svc_program
e41ee44cc6a4 nfsd: remove nfsd_stats, make th_cnt a global counter
16fb9808ab2c nfsd: make svc_stat per-network namespace instead of global
I'm wondering if this difference between NFS client, and NFS server,
stat behaviour, across kernel versions, may perhaps cause some user
confusion?
cheers,
calum.
> so your check for kernel version "6.9" in the test may need to be
> adjusted, if LTP is intended to be run on stable kernels?
>
> best wishes,
> calum.
>
>
> [1] https://lore.kernel.org/ltp/20240620111129.594449-1-pvorel@suse.cz/
> [2] https://patchwork.ozlabs.org/project/ltp/
> patch/20240620111129.594449-1-pvorel(a)suse.cz/
> [3] https://lore.kernel.org/linux-nfs/
> cover.1708026931.git.josef(a)toxicpanda.com/
We recently made GUP's common page table walking code to also walk hugetlb
VMAs without most hugetlb special-casing, preparing for the future of
having less hugetlb-specific page table walking code in the codebase.
Turns out that we missed one page table locking detail: page table locking
for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------
[ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
[ 3105.944634] Modules linked in: [...]
[ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
[ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
[ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3105.991108] pc : try_grab_folio+0x11c/0x188
[ 3105.994013] lr : follow_page_pte+0xd8/0x430
[ 3105.996986] sp : ffff80008eafb8f0
[ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
[ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
[ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
[ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
[ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
[ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
[ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
[ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
[ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
[ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
[ 3106.047957] Call trace:
[ 3106.049522] try_grab_folio+0x11c/0x188
[ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
[ 3106.055527] follow_page_mask+0x1a0/0x2b8
[ 3106.058118] __get_user_pages+0xf0/0x348
[ 3106.060647] faultin_page_range+0xb0/0x360
[ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively use the same PT locks as any
core-mm page table walker would. Add ptep_lockptr() to obtain the PTE
page table lock using a pte pointer -- unfortunately we cannot convert
pte_lockptr() because virt_to_page() doesn't work with kmap'ed page
tables we can have with CONFIG_HIGHPTE.
Take care of PTE tables possibly spanning multiple pages, and take care of
CONFIG_PGTABLE_LEVELS complexity when e.g., PMD_SIZE == PUD_SIZE. For
example, with CONFIG_PGTABLE_LEVELS == 2, core-mm would detect
with hugepagesize==PMD_SIZE pmd_leaf() and use the pmd_lockptr(), which
would end up just mapping to the per-MM PT lock.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
folio being mapped using two PTE page tables. While hugetlb wants to take
the PMD table lock, core-mm would grab the PTE table lock of one of both
PTE page tables. In such corner cases, we have to make sure that both
locks match, which is (fortunately!) currently guaranteed for 8xx as it
does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Reviewed-by: James Houghton <jthoughton(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
Third time is the charm?
Retested on arm64 and x86-64. Cross-compiled on a bunch of others.
v2 -> v3:
* Handle CONFIG_PGTABLE_LEVELS oddities as good as possible. It's a mess.
Remove the size >= P4D_SIZE check and simply default to the
&mm->page_table_lock.
* Align the PTE pointer to the start of the page table to handle PTE page
tables bigger than a single page (unclear if this could currently trigger).
* Extend patch description
v1 -> 2:
* Extend patch description
* Drop "mm: let pte_lockptr() consume a pte_t pointer"
* Introduce ptep_lockptr() in this patch
---
include/linux/hugetlb.h | 27 +++++++++++++++++++++++++--
include/linux/mm.h | 22 ++++++++++++++++++++++
2 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c9bf68c239a01..e6437a06e2346 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -944,9 +944,32 @@ static inline bool htlb_allow_alloc_fallback(int reason)
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
- if (huge_page_size(h) == PMD_SIZE)
+ unsigned long size = huge_page_size(h);
+
+ VM_WARN_ON(size == PAGE_SIZE);
+
+ /*
+ * hugetlb must use the exact same PT locks as core-mm page table
+ * walkers would. When modifying a PTE table, hugetlb must take the
+ * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+ * PT lock etc.
+ *
+ * The expectation is that any hugetlb folio smaller than a PMD is
+ * always mapped into a single PTE table and that any hugetlb folio
+ * smaller than a PUD (but at least as big as a PMD) is always mapped
+ * into a single PMD table.
+ *
+ * If that does not hold for an architecture, then that architecture
+ * must disable split PT locks such that all *_lockptr() functions
+ * will give us the same result: the per-MM PT lock.
+ */
+ if (size < PMD_SIZE && !IS_ENABLED(CONFIG_HIGHPTE))
+ /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
+ return ptep_lockptr(mm, pte);
+ else if (size < PUD_SIZE || CONFIG_PGTABLE_LEVELS == 2)
return pmd_lockptr(mm, (pmd_t *) pte);
- VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
+ else if (size < P4D_SIZE || CONFIG_PGTABLE_LEVELS == 3)
+ return pud_lockptr(mm, (pud_t *) pte);
return &mm->page_table_lock;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b100df8cb5857..f6c7fe8f5746f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2926,6 +2926,24 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
}
+static inline struct page *ptep_pgtable_page(pte_t *pte)
+{
+ unsigned long mask = ~(PTRS_PER_PTE * sizeof(pte_t) - 1);
+
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
+ return virt_to_page((void *)((unsigned long)pte & mask));
+}
+
+static inline struct ptdesc *ptep_ptdesc(pte_t *pte)
+{
+ return page_ptdesc(ptep_pgtable_page(pte));
+}
+
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return ptlock_ptr(ptep_ptdesc(pte));
+}
+
static inline bool ptlock_init(struct ptdesc *ptdesc)
{
/*
@@ -2950,6 +2968,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
{
return &mm->page_table_lock;
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return &mm->page_table_lock;
+}
static inline void ptlock_cache_init(void) {}
static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
static inline void ptlock_free(struct ptdesc *ptdesc) {}
--
2.45.2
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue. Which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(), this happens because DSTS[[21:18]
wasn't updated to U0 yet. It is observed that the latency of DSTS can be
in order of milli-seconds. Hence update the dwc->link_state from evtinfo,
and use this variable to prevent calling remote wakup unnecessarily.
Fixes: ecba9bc9946b ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/dwc3/gadget.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..3b55285118b0 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -328,7 +328,8 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
}
if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
+ int link_state;
+ bool remote_wakeup = false;
/*
* Initiate remote wakeup if the link state is in U3 when
@@ -339,15 +340,26 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
link_state = dwc3_gadget_get_link_state(dwc);
switch (link_state) {
case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
+ if (dwc->gadget->speed < USB_SPEED_SUPER)
+ remote_wakeup = true;
+ break;
+ case DWC3_LINK_STATE_U3:
+ /*
+ * In HS, DSTS can take few milliseconds to update linkstate bits,
+ * so rely on dwc->link_state to identify whether gadget woke up.
+ * Don't issue remote wakuep again if link is already in U0.
+ */
+ if (dwc->link_state == DWC3_LINK_STATE_U0)
break;
- fallthrough;
- case DWC3_LINK_STATE_U3:
+ remote_wakeup = true;
+ break;
+ }
+
+ if (remote_wakeup) {
ret = __dwc3_gadget_wakeup(dwc, false);
dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
ret);
- break;
}
}
@@ -4214,6 +4226,7 @@ static void dwc3_gadget_conndone_interrupt(struct dwc3 *dwc)
static void dwc3_gadget_wakeup_interrupt(struct dwc3 *dwc, unsigned int evtinfo)
{
dwc->suspended = false;
+ dwc->link_state = evtinfo & DWC3_LINK_STATE_MASK;
/*
* TODO take core out of low power mode when that's
@@ -4225,8 +4238,6 @@ static void dwc3_gadget_wakeup_interrupt(struct dwc3 *dwc, unsigned int evtinfo)
dwc->gadget_driver->resume(dwc->gadget);
spin_lock(&dwc->lock);
}
-
- dwc->link_state = evtinfo & DWC3_LINK_STATE_MASK;
}
static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
--
2.25.1
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb ]
Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete().
Reorder scheduling the work before calling complete().
This seems more logical in the first place, as it's
the inverse order of what the submitting thread will do.
Reported-by: valis <sec(a)valis.email>
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Reviewed-by: Sabrina Dubroca <sd(a)queasysnail.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
(cherry picked from commit 6db22d6c7a6dc914b12c0469b94eb639b6a8a146)
[Lee: Fixed merge-conflict in Stable branches linux-6.1.y and older]
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
net/tls/tls_sw.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2bd27b77769cb..d53587ff9ddea 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -449,7 +449,6 @@ static void tls_encrypt_done(crypto_completion_data_t *data, int err)
struct scatterlist *sge;
struct sk_msg *msg_en;
struct tls_rec *rec;
- bool ready = false;
struct sock *sk;
rec = container_of(aead_req, struct tls_rec, aead_req);
@@ -486,19 +485,16 @@ static void tls_encrypt_done(crypto_completion_data_t *data, int err)
/* If received record is at head of tx_list, schedule tx */
first_rec = list_first_entry(&ctx->tx_list,
struct tls_rec, list);
- if (rec == first_rec)
- ready = true;
+ if (rec == first_rec) {
+ /* Schedule the transmission */
+ if (!test_and_set_bit(BIT_TX_SCHEDULED,
+ &ctx->tx_bitmask))
+ schedule_delayed_work(&ctx->tx_work.work, 1);
+ }
}
if (atomic_dec_and_test(&ctx->encrypt_pending))
complete(&ctx->async_wait.completion);
-
- if (!ready)
- return;
-
- /* Schedule the transmission */
- if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
- schedule_delayed_work(&ctx->tx_work.work, 1);
}
static int tls_encrypt_async_wait(struct tls_sw_context_tx *ctx)
--
2.44.0.278.ge034bb2e1d-goog
commit 11a1f4bc47362700fcbde717292158873fb847ed upstream.
Keith reports a use-after-free when a DPC event occurs concurrently to
hot-removal of the same portion of the hierarchy:
The dpc_handler() awaits readiness of the secondary bus below the
Downstream Port where the DPC event occurred. To do so, it polls the
config space of the first child device on the secondary bus. If that
child device is concurrently removed, accesses to its struct pci_dev
cause the kernel to oops.
That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
reference on the child device. Before v6.3, the function was only
called on resume from system sleep or on runtime resume. Holding a
reference wasn't necessary back then because the pciehp IRQ thread
could never run concurrently. (On resume from system sleep, IRQs are
not enabled until after the resume_noirq phase. And runtime resume is
always awaited before a PCI device is removed.)
However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness
of secondary bus after reset"), which introduced that, failed to
appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
reference on the child device because dpc_handler() and pciehp may
indeed run concurrently. The commit was backported to v5.10+ stable
kernels, so that's the oldest one affected.
Add the missing reference acquisition.
Abridged stack trace:
BUG: unable to handle page fault for address: 00000000091400c0
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0
RIP: pci_bus_read_config_dword+0x17/0x50
pci_dev_wait()
pci_bridge_wait_for_secondary_bus()
dpc_reset_link()
pcie_do_recovery()
dpc_handler()
Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset")
Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/
Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.…
Reported-by: Keith Busch <kbusch(a)kernel.org>
Tested-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski(a)kernel.org>
Reviewed-by: Keith Busch <kbusch(a)kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.10+
---
drivers/pci/pci.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 530ced8f7abd..09d5fa637b98 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4817,7 +4817,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
int timeout)
{
struct pci_dev *child;
- int delay;
+ int delay, ret = 0;
if (pci_dev_is_disconnected(dev))
return 0;
@@ -4845,8 +4845,8 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
return 0;
}
- child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
- bus_list);
+ child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
+ struct pci_dev, bus_list));
up_read(&pci_bus_sem);
/*
@@ -4856,7 +4856,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
if (!pci_is_pcie(dev)) {
pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay);
msleep(1000 + delay);
- return 0;
+ goto put_child;
}
/*
@@ -4877,7 +4877,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
* until the timeout expires.
*/
if (!pcie_downstream_port(dev))
- return 0;
+ goto put_child;
if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
@@ -4888,11 +4888,16 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
if (!pcie_wait_for_link_delay(dev, true, delay)) {
/* Did not train, no need to wait any further */
pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
- return -ENOTTY;
+ ret = -ENOTTY;
+ goto put_child;
}
}
- return pci_dev_wait(child, reset_type, timeout - delay);
+ ret = pci_dev_wait(child, reset_type, timeout - delay);
+
+put_child:
+ pci_dev_put(child);
+ return ret;
}
void pci_reset_secondary_bus(struct pci_dev *dev)
--
2.43.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3415b10a03945b0da4a635e146750dfe5ce0f448
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073000-polish-wish-b988@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
3415b10a0394 ("kbuild: Fix '-S -c' in x86 stack protector scripts")
3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
c9a1ff316bc9 ("x86/stackprotector: Pre-initialize canary for secondary CPUs")
dc4e0021b00b ("x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area")
e99b6f46ee5c ("x86/doublefault/32: Rename doublefault.c to doublefault_32.c")
93efbde2c331 ("x86/traps: Disentangle the 32-bit and 64-bit doublefault code")
ab851d49f6bf ("Merge branch 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3415b10a03945b0da4a635e146750dfe5ce0f448 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 26 Jul 2024 11:05:00 -0700
Subject: [PATCH] kbuild: Fix '-S -c' in x86 stack protector scripts
After a recent change in clang to stop consuming all instances of '-S'
and '-c' [1], the stack protector scripts break due to the kernel's use
of -Werror=unused-command-line-argument to catch cases where flags are
not being properly consumed by the compiler driver:
$ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument
clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument]
This results in CONFIG_STACKPROTECTOR getting disabled because
CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set.
'-c' and '-S' both instruct the compiler to stop at different stages of
the pipeline ('-S' after compiling, '-c' after assembling), so having
them present together in the same command makes little sense. In this
case, the test wants to stop before assembling because it is looking at
the textual assembly output of the compiler for either '%fs' or '%gs',
so remove '-c' from the list of arguments to resolve the error.
All versions of GCC continue to work after this change, along with
versions of clang that do or do not contain the change mentioned above.
Cc: stable(a)vger.kernel.org
Fixes: 4f7fd4d7a791 ("[PATCH] Add the -fstack-protector option to the CFLAGS")
Fixes: 60a5317ff0f4 ("x86: implement x86_32 stack protector")
Link: https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353… [1]
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
index 825c75c5b715..9459ca4f0f11 100755
--- a/scripts/gcc-x86_32-has-stack-protector.sh
+++ b/scripts/gcc-x86_32-has-stack-protector.sh
@@ -5,4 +5,4 @@
# -mstack-protector-guard-reg, added by
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
diff --git a/scripts/gcc-x86_64-has-stack-protector.sh b/scripts/gcc-x86_64-has-stack-protector.sh
index 75e4e22b986a..f680bb01aeeb 100755
--- a/scripts/gcc-x86_64-has-stack-protector.sh
+++ b/scripts/gcc-x86_64-has-stack-protector.sh
@@ -1,4 +1,4 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m64 -O0 -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -m64 -O0 -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
Some older systems still compile kernels with old gcc version.
$ grep -B3 "^GNU C" linux-5.10.223-rc1/Documentation/Changes
====================== =============== ========================================
Program Minimal version Command to check the version
====================== =============== ========================================
GNU C 4.9 gcc --version
These warnings and errors show up when compiling with gcc 4.9.2
UPD include/config/kernel.release
UPD include/generated/uapi/linux/version.h
UPD include/generated/utsrelease.h
CC scripts/mod/empty.o
In file included from ././include/linux/compiler_types.h:65:0,
from <command-line>:0:
./include/linux/compiler_attributes.h:29:29: warning: "__GCC4_has_attribute___uninitialized__" is not defined [-Wundef]
# define __has_attribute(x) __GCC4_has_attribute_##x
^
./include/linux/compiler_attributes.h:278:5: note: in expansion of macro '__has_attribute'
#if __has_attribute(__uninitialized__)
^
[SNIP]
AR arch/x86/events/built-in.a
CC arch/x86/kvm/../../../virt/kvm/kvm_main.o
In file included from ././include/linux/compiler_types.h:65:0,
from <command-line>:0:
./include/linux/compiler_attributes.h:29:29: error: "__GCC4_has_attribute___uninitialized__" is not defined [-Werror=undef]
# define __has_attribute(x) __GCC4_has_attribute_##x
^
./include/linux/compiler_attributes.h:278:5: note: in expansion of macro '__has_attribute'
#if __has_attribute(__uninitialized__)
^
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:286: arch/x86/kvm/../../../virt/kvm/kvm_main.o] Error 1
make[1]: *** [scripts/Makefile.build:503: arch/x86/kvm] Error 2
make: *** [Makefile:1832: arch/x86] Error 2
Following patch fixes this. Upstream won't need this because
newer kernels are not compilable with gcc 4.9
--- ./include/linux/compiler_attributes.h.OLD
+++ ./include/linux/compiler_attributes.h
@@ -37,6 +37,7 @@
# define __GCC4_has_attribute___nonstring__ 0
# define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
# define __GCC4_has_attribute___no_sanitize_undefined__ (__GNUC_MINOR__ >= 9)
+# define __GCC4_has_attribute___uninitialized__ 0
# define __GCC4_has_attribute___fallthrough__ 0
# define __GCC4_has_attribute___warning__ 1
#endif
Fixes: upstream fd7eea27a3ae "Compiler Attributes: Add __uninitialized macro"
Signed-off-by: Jari Ruusu <jariruusu(a)protonmail.com>
--
Jari Ruusu 4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD ACDF F073 3C80 8132 F189
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5596d9e8b553dacb0ac34bcf873cbbfb16c3ba3e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071559-unroasted-trapper-8b66@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
5596d9e8b553 ("mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio()")
bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
51718e25c53f ("mm: convert arch_clear_hugepage_flags to take a folio")
831bc31a5e82 ("mm: hugetlb: improve the handling of hugetlb allocation failure for freed or in-use hugetlb")
ebc20dcac4ce ("mm: hugetlb_vmemmap: convert page to folio")
c5ad3233ead5 ("hugetlb_vmemmap: use folio argument for hugetlb_vmemmap_* functions")
c24f188b2289 ("hugetlb: batch TLB flushes when restoring vmemmap")
f13b83fdd996 ("hugetlb: batch TLB flushes when freeing vmemmap")
f4b7e3efaddb ("hugetlb: batch PMD split for bulk vmemmap dedup")
91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")
cfb8c75099db ("hugetlb: perform vmemmap restoration on a list of pages")
79359d6d24df ("hugetlb: perform vmemmap optimization on a list of pages")
d67e32f26713 ("hugetlb: restructure pool allocations")
d2cf88c27f51 ("hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles")
30a89adf872d ("hugetlb: check for hugetlb folio before vmemmap_restore")
d5b43e9683ec ("hugetlb: convert remove_pool_huge_page() to remove_pool_hugetlb_folio()")
04bbfd844b99 ("hugetlb: remove a few calls to page_folio()")
fde1c4ecf916 ("mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO")
3ee0aa9f0675 ("mm: move some shrinker-related function declarations to mm/internal.h")
d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5596d9e8b553dacb0ac34bcf873cbbfb16c3ba3e Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Mon, 8 Jul 2024 10:51:27 +0800
Subject: [PATCH] mm/hugetlb: fix potential race in
__update_and_free_hugetlb_folio()
There is a potential race between __update_and_free_hugetlb_folio() and
try_memory_failure_hugetlb():
CPU1 CPU2
__update_and_free_hugetlb_folio try_memory_failure_hugetlb
folio_test_hugetlb
-- It's still hugetlb folio.
folio_clear_hugetlb_hwpoison
spin_lock_irq(&hugetlb_lock);
__get_huge_page_for_hwpoison
folio_set_hugetlb_hwpoison
spin_unlock_irq(&hugetlb_lock);
spin_lock_irq(&hugetlb_lock);
__folio_clear_hugetlb(folio);
-- Hugetlb flag is cleared but too late.
spin_unlock_irq(&hugetlb_lock);
When the above race occurs, raw error page info will be leaked. Even
worse, raw error pages won't have hwpoisoned flag set and hit
pcplists/buddy. Fix this issue by deferring
folio_clear_hugetlb_hwpoison() until __folio_clear_hugetlb() is done. So
all raw error pages will have hwpoisoned flag set.
Link: https://lkml.kernel.org/r/20240708025127.107713-1-linmiaohe@huawei.com
Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2afb70171b76..fe44324d6383 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1725,13 +1725,6 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
return;
}
- /*
- * Move PageHWPoison flag from head page to the raw error pages,
- * which makes any healthy subpages reusable.
- */
- if (unlikely(folio_test_hwpoison(folio)))
- folio_clear_hugetlb_hwpoison(folio);
-
/*
* If vmemmap pages were allocated above, then we need to clear the
* hugetlb flag under the hugetlb lock.
@@ -1742,6 +1735,13 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
spin_unlock_irq(&hugetlb_lock);
}
+ /*
+ * Move PageHWPoison flag from head page to the raw error pages,
+ * which makes any healthy subpages reusable.
+ */
+ if (unlikely(folio_test_hwpoison(folio)))
+ folio_clear_hugetlb_hwpoison(folio);
+
folio_ref_unfreeze(folio, 1);
/*