April 2023 - Linux-stable-mirror

[PATCH RESEND 6.1.y,6.2.y] purgatory: fix disabling debug info

by Alyssa Ross

Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS. Instead, it includes -g, the appropriate -gdwarf-* flag, and also the -Wa versions of both of those if building with Clang and GNU as. As a result, debug info was being generated for the purgatory objects, even though the intention was that it not be. Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files") Signed-off-by: Alyssa Ross <hi(a)alyssa.is> Cc: stable(a)vger.kernel.org Acked-by: Nick Desaulniers <ndesaulniers(a)google.com> Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org> (cherry picked from commit d83806c4c0cccc0d6d3c3581a11983a9c186a138) --- arch/riscv/purgatory/Makefile | 4 +--- arch/x86/purgatory/Makefile | 3 +-- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile index dd58e1d99397..659e21862077 100644 --- a/arch/riscv/purgatory/Makefile +++ b/arch/riscv/purgatory/Makefile @@ -74,9 +74,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS) CFLAGS_REMOVE_ctype.o += $(PURGATORY_CFLAGS_REMOVE) CFLAGS_ctype.o += $(PURGATORY_CFLAGS) -AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2 -AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2 -AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2 +asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x)) $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE $(call if_changed,ld) diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile index 17f09dc26381..82fec66d46d2 100644 --- a/arch/x86/purgatory/Makefile +++ b/arch/x86/purgatory/Makefile @@ -69,8 +69,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS) CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE) CFLAGS_string.o += $(PURGATORY_CFLAGS) -AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2 -AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2 +asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x)) $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE $(call if_changed,ld) base-commit: cdc7aff9ed012801e62eedd99e4a5573eccac4db -- 2.37.1

2 years, 1 month

1
0
0 0

[merged mm-nonmm-stable] ia64-fix-an-addr-to-taddr-in-huge_pte_offset.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: ia64: fix an addr to taddr in huge_pte_offset() has been removed from the -mm tree. Its filename was ia64-fix-an-addr-to-taddr-in-huge_pte_offset.patch This patch was dropped because it was merged into the mm-nonmm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: ia64: fix an addr to taddr in huge_pte_offset() Date: Sun, 16 Apr 2023 22:17:05 -0700 (PDT) I know nothing of ia64 htlbpage_to_page(), but guess that the p4d line should be using taddr rather than addr, like everywhere else. Link: https://lkml.kernel.org/r/732eae88-3beb-246-2c72-281de786740@google.com Fixes: c03ab9e32a2c ("ia64: add support for folded p4d page tables") Signed-off-by: Hugh Dickins <hughd(a)google.com Acked-by: Mike Kravetz <mike.kravetz(a)oracle.com> Acked-by: Mike Rapoport (IBM) <rppt(a)kernel.org> Cc: Ard Biesheuvel <ardb(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/ia64/mm/hugetlbpage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/ia64/mm/hugetlbpage.c~ia64-fix-an-addr-to-taddr-in-huge_pte_offset +++ a/arch/ia64/mm/hugetlbpage.c @@ -58,7 +58,7 @@ huge_pte_offset (struct mm_struct *mm, u pgd = pgd_offset(mm, taddr); if (pgd_present(*pgd)) { - p4d = p4d_offset(pgd, addr); + p4d = p4d_offset(pgd, taddr); if (p4d_present(*p4d)) { pud = pud_offset(p4d, taddr); if (pud_present(*pud)) { _ Patches currently in -mm which might be from hughd(a)google.com are

2 years, 1 month

1
0
0 0

[merged mm-stable] mm-hugetlb-fix-uffd-wp-during-fork.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/hugetlb: fix uffd-wp during fork() has been removed from the -mm tree. Its filename was mm-hugetlb-fix-uffd-wp-during-fork.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Peter Xu <peterx(a)redhat.com> Subject: mm/hugetlb: fix uffd-wp during fork() Date: Mon, 17 Apr 2023 15:53:12 -0400 Patch series "mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins", v2. This patch (of 6): There're a bunch of things that were wrong: - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() rather than huge_pte_uffd_wp(). - When copying over a pte, we should drop uffd-wp bit when !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). - When doing early CoW for private hugetlb (e.g. when the parent page was pinned), uffd-wp bit should be properly carried over if necessary. No bug reported probably because most people do not even care about these corner cases, but they are still bugs and can be exposed by the recent unit tests introduced, so fix all of them in one shot. Link: https://lkml.kernel.org/r/20230417195317.898696-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230417195317.898696-2-peterx@redhat.com Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") Signed-off-by: Peter Xu <peterx(a)redhat.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: Mika Penttil�� <mpenttil(a)redhat.com> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Nadav Amit <nadav.amit(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-uffd-wp-during-fork +++ a/mm/hugetlb.c @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned( static void hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, - struct folio *new_folio) + struct folio *new_folio, pte_t old) { + pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); + __folio_mark_uptodate(new_folio); hugepage_add_new_anon_rmap(new_folio, vma, addr); - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1)); + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) + newpte = huge_pte_mkuffd_wp(newpte); + set_huge_pte_at(vma->vm_mm, addr, ptep, newpte); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); } @@ -5032,14 +5036,12 @@ again: */ ; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { - bool uffd_wp = huge_pte_uffd_wp(entry); - - if (!userfaultfd_wp(dst_vma) && uffd_wp) + if (!userfaultfd_wp(dst_vma)) entry = huge_pte_clear_uffd_wp(entry); set_huge_pte_at(dst, addr, dst_pte, entry); } else if (unlikely(is_hugetlb_entry_migration(entry))) { swp_entry_t swp_entry = pte_to_swp_entry(entry); - bool uffd_wp = huge_pte_uffd_wp(entry); + bool uffd_wp = pte_swp_uffd_wp(entry); if (!is_readable_migration_entry(swp_entry) && cow) { /* @@ -5050,10 +5052,10 @@ again: swp_offset(swp_entry)); entry = swp_entry_to_pte(swp_entry); if (userfaultfd_wp(src_vma) && uffd_wp) - entry = huge_pte_mkuffd_wp(entry); + entry = pte_swp_mkuffd_wp(entry); set_huge_pte_at(src, addr, src_pte, entry); } - if (!userfaultfd_wp(dst_vma) && uffd_wp) + if (!userfaultfd_wp(dst_vma)) entry = huge_pte_clear_uffd_wp(entry); set_huge_pte_at(dst, addr, dst_pte, entry); } else if (unlikely(is_pte_marker(entry))) { @@ -5118,7 +5120,8 @@ again: /* huge_ptep of dst_pte won't change as in child */ goto again; } - hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio); + hugetlb_install_folio(dst_vma, dst_pte, addr, + new_folio, src_pte_old); spin_unlock(src_ptl); spin_unlock(dst_ptl); continue; @@ -5136,6 +5139,9 @@ again: entry = huge_pte_wrprotect(entry); } + if (!userfaultfd_wp(dst_vma)) + entry = huge_pte_clear_uffd_wp(entry); + set_huge_pte_at(dst, addr, dst_pte, entry); hugetlb_count_add(npages, dst); } _ Patches currently in -mm which might be from peterx(a)redhat.com are

2 years, 1 month

1
0
0 0

[PATCH v2 4/4] xhci: Add zhaoxin xHCI U1/U2 feature support

by Weitao Wang

Add U1/U2 feature support of xHCI for zhaoxin. Since both Intel and Zhaoxin need to check the tier where the device is located to determine whether to enabled U1/U2, remove the previous Intel U1/U2 tier policy and add common policy in xhci_check_tier_policy. If vendor has specific U1/U2 enable policy,quirks can be add to declare. Suggested-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> Cc: stable(a)vger.kernel.org Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com> --- v1->v2 - Modify the description. - Adjust U1/U2 tier enable policy. drivers/usb/host/xhci.c | 43 +++++++++++++++++------------------------ 1 file changed, 18 insertions(+), 25 deletions(-) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 31d6ace9cace..b81a69126188 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -4802,7 +4802,7 @@ static u16 xhci_calculate_u1_timeout(struct xhci_hcd *xhci, } } - if (xhci->quirks & XHCI_INTEL_HOST) + if (xhci->quirks & (XHCI_INTEL_HOST | XHCI_ZHAOXIN_HOST)) timeout_ns = xhci_calculate_intel_u1_timeout(udev, desc); else timeout_ns = udev->u1_params.sel; @@ -4866,7 +4866,7 @@ static u16 xhci_calculate_u2_timeout(struct xhci_hcd *xhci, } } - if (xhci->quirks & XHCI_INTEL_HOST) + if (xhci->quirks & (XHCI_INTEL_HOST | XHCI_ZHAOXIN_HOST)) timeout_ns = xhci_calculate_intel_u2_timeout(udev, desc); else timeout_ns = udev->u2_params.sel; @@ -4938,37 +4938,30 @@ static int xhci_update_timeout_for_interface(struct xhci_hcd *xhci, return 0; } -static int xhci_check_intel_tier_policy(struct usb_device *udev, +static int xhci_check_tier_policy(struct xhci_hcd *xhci, + struct usb_device *udev, enum usb3_link_state state) { - struct usb_device *parent; - unsigned int num_hubs; + struct usb_device *parent = udev->parent; + int tier = 1; /* roothub is tier1 */ - /* Don't enable U1 if the device is on a 2nd tier hub or lower. */ - for (parent = udev->parent, num_hubs = 0; parent->parent; - parent = parent->parent) - num_hubs++; + while (parent) { + parent = parent->parent; + tier++; + } - if (num_hubs < 2) - return 0; + if (xhci->quirks & XHCI_INTEL_HOST && tier > 3) + goto fail; + if (xhci->quirks & XHCI_ZHAOXIN_HOST && tier > 2) + goto fail; - dev_dbg(&udev->dev, "Disabling U1/U2 link state for device" - " below second-tier hub.\n"); - dev_dbg(&udev->dev, "Plug device into first-tier hub " - "to decrease power consumption.\n"); + return 0; +fail: + dev_dbg(&udev->dev, "Tier policy prevents U1/U2 LPM states for devices at tier %d\n", + tier); return -E2BIG; } -static int xhci_check_tier_policy(struct xhci_hcd *xhci, - struct usb_device *udev, - enum usb3_link_state state) -{ - if (xhci->quirks & XHCI_INTEL_HOST) - return xhci_check_intel_tier_policy(udev, state); - else - return 0; -} - /* Returns the U1 or U2 timeout that should be enabled. * If the tier check or timeout setting functions return with a non-zero exit * code, that means the timeout value has been finalized and we shouldn't look -- 2.32.0

2 years, 1 month

1
0
0 0

[PATCH v2 2/4] xhci: fix issue of cross page boundary in TRB prefetch

by Weitao Wang

On some Zhaoxin platforms, xHCI will prefetch TRB for performance improvement. However this TRB prefetch mechanism may cross page boundary, which may access memory not allocated by xHCI driver. In order to fix this issue, two pages was allocated for TRB and only the first page will be used. Cc: stable(a)vger.kernel.org Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com> --- drivers/usb/host/xhci-mem.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index d0a9467aa5fc..d5517400d874 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -2369,8 +2369,12 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags) * and our use of dma addresses in the trb_address_map radix tree needs * TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need. */ - xhci->segment_pool = dma_pool_create("xHCI ring segments", dev, - TRB_SEGMENT_SIZE, TRB_SEGMENT_SIZE, xhci->page_size); + if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH) + xhci->segment_pool = dma_pool_create("xHCI ring segments", dev, + TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2); + else + xhci->segment_pool = dma_pool_create("xHCI ring segments", dev, + TRB_SEGMENT_SIZE, TRB_SEGMENT_SIZE, xhci->page_size); /* See Table 46 and Note on Figure 55 */ xhci->device_pool = dma_pool_create("xHCI input/output contexts", dev, -- 2.32.0

2 years, 1 month

1
0
0 0

[PATCH] PCI:ASPM: Remove pcie_aspm_pm_state_change()

by Mark Hasemeyer

commit 08d0cc5f34265d1a1e3031f319f594bd1970976c upstream. This change is desired because without it, it has been observed that re-applying aspm settings can cause the system to crash with certain pci devices (ie. Genesys GL9755). Tested by issuing 100 suspend/resume cycles on a symptomatic system running 5.15.107. L1 settings looked identical before and after: ``` localhost ~ # lspci -vvv -d 0x17a0: | grep L1Sub L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1SubCtl2: T_PwrOn=3100us ``` Cc: <stable(a)vger.kernel.org> # 5.15.y

2 years, 1 month

1
0
0 0

[PATCH v2] UHCI:adjust zhaoxin UHCI controllers OverCurrent bit value

by Weitao Wang

OverCurrent condition is not standardized in the UHCI spec. Zhaoxin UHCI controllers report OverCurrent bit active off. In order to handle OverCurrent condition correctly, the uhci-hcd driver needs to be told to expect the active-off behavior. Suggested-by: Alan Stern <stern(a)rowland.harvard.edu> Cc: stable(a)vger.kernel.org Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com> --- v1->v2 - Modify the description of this patch. - Let Zhaoxin and VIA share a common oc_low flag drivers/usb/host/uhci-pci.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/uhci-pci.c b/drivers/usb/host/uhci-pci.c index 3592f757fe05..034586911bb5 100644 --- a/drivers/usb/host/uhci-pci.c +++ b/drivers/usb/host/uhci-pci.c @@ -119,11 +119,12 @@ static int uhci_pci_init(struct usb_hcd *hcd) uhci->rh_numports = uhci_count_ports(hcd); - /* Intel controllers report the OverCurrent bit active on. - * VIA controllers report it active off, so we'll adjust the - * bit value. (It's not standardized in the UHCI spec.) + /* Intel controllers report the OverCurrent bit active on. VIA + * and ZHAOXIN controllers report it active off, so we'll adjust + * the bit value. (It's not standardized in the UHCI spec.) */ - if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA) + if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA || + to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_ZHAOXIN) uhci->oc_low = 1; /* HP's server management chip requires a longer port reset delay. */ -- 2.32.0

2 years, 1 month

1
0
0 0

linux-next: build failure after merge of the tip tree

by broonie＠kernel.org

Hi all, After merging the tip tree, today's linux-next build (arm multi_v7_defconfig) failed like this: /tmp/next/build/kernel/time/posix-cpu-timers.c: In function 'posix_cpu_timer_wait_running_nsleep': /tmp/next/build/kernel/time/posix-cpu-timers.c:1310:30: error: 'timr' is a pointer; did you mean to use '->'? 1310 | spin_unlock_irq(&timr.it_lock); | ^ | -> /tmp/next/build/kernel/time/posix-cpu-timers.c:1312:28: error: 'timr' is a pointer; did you mean to use '->'? 1312 | spin_lock_irq(&timr.it_lock); | ^ | -> Caused by commit 2aaae4bf41b101f7e ("posix-cpu-timers: Implement the missing timer_wait_running callback") The !POSIX_CPU_TIMERS_TASK_WORK case wasn't fully updated. I've used the version of the tip tree from next-20230420 instead.

2 years, 1 month

1
0
0 0

HALLO

by Jennifer Trujillo

-- Hallo, Wie geht es dir?

2 years, 1 month

1
0
0 0

[tip: timers/core] posix-cpu-timers: Implement the missing timer_wait_running callback

by tip-bot2 for Thomas Gleixner

The following commit has been merged into the timers/core branch of tip: Commit-ID: f7abf14f0001a5a47539d9f60bbdca649e43536b Gitweb: https://git.kernel.org/tip/f7abf14f0001a5a47539d9f60bbdca649e43536b Author: Thomas Gleixner <tglx(a)linutronix.de> AuthorDate: Mon, 17 Apr 2023 15:37:55 +02:00 Committer: Thomas Gleixner <tglx(a)linutronix.de> CommitterDate: Fri, 21 Apr 2023 15:34:33 +02:00 posix-cpu-timers: Implement the missing timer_wait_running callback For some unknown reason the introduction of the timer_wait_running callback missed to fixup posix CPU timers, which went unnoticed for almost four years. Marco reported recently that the WARN_ON() in timer_wait_running() triggers with a posix CPU timer test case. Posix CPU timers have two execution models for expiring timers depending on CONFIG_POSIX_CPU_TIMERS_TASK_WORK: 1) If not enabled, the expiry happens in hard interrupt context so spin waiting on the remote CPU is reasonably time bound. Implement an empty stub function for that case. 2) If enabled, the expiry happens in task work before returning to user space or guest mode. The expired timers are marked as firing and moved from the timer queue to a local list head with sighand lock held. Once the timers are moved, sighand lock is dropped and the expiry happens in fully preemptible context. That means the expiring task can be scheduled out, migrated, interrupted etc. So spin waiting on it is more than suboptimal. The timer wheel has a timer_wait_running() mechanism for RT, which uses a per CPU timer-base expiry lock which is held by the expiry code and the task waiting for the timer function to complete blocks on that lock. This does not work in the same way for posix CPU timers as there is no timer base and expiry for process wide timers can run on any task belonging to that process, but the concept of waiting on an expiry lock can be used too in a slightly different way: - Add a mutex to struct posix_cputimers_work. This struct is per task and used to schedule the expiry task work from the timer interrupt. - Add a task_struct pointer to struct cpu_timer which is used to store a the task which runs the expiry. That's filled in when the task moves the expired timers to the local expiry list. That's not affecting the size of the k_itimer union as there are bigger union members already - Let the task take the expiry mutex around the expiry function - Let the waiter acquire a task reference with rcu_read_lock() held and block on the expiry mutex This avoids spin-waiting on a task which might not even be on a CPU and works nicely for RT too. Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT") Reported-by: Marco Elver <elver(a)google.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Tested-by: Marco Elver <elver(a)google.com> Tested-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx --- include/linux/posix-timers.h | 17 ++++--- kernel/time/posix-cpu-timers.c | 81 +++++++++++++++++++++++++++------ kernel/time/posix-timers.c | 4 ++- 3 files changed, 82 insertions(+), 20 deletions(-) diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 2c6e99c..d607f51 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -4,6 +4,7 @@ #include <linux/spinlock.h> #include <linux/list.h> +#include <linux/mutex.h> #include <linux/alarmtimer.h> #include <linux/timerqueue.h> @@ -62,16 +63,18 @@ static inline int clockid_to_fd(const clockid_t clk) * cpu_timer - Posix CPU timer representation for k_itimer * @node: timerqueue node to queue in the task/sig * @head: timerqueue head on which this timer is queued - * @task: Pointer to target task + * @pid: Pointer to target task PID * @elist: List head for the expiry list * @firing: Timer is currently firing + * @handling: Pointer to the task which handles expiry */ struct cpu_timer { - struct timerqueue_node node; - struct timerqueue_head *head; - struct pid *pid; - struct list_head elist; - int firing; + struct timerqueue_node node; + struct timerqueue_head *head; + struct pid *pid; + struct list_head elist; + int firing; + struct task_struct __rcu *handling; }; static inline bool cpu_timer_enqueue(struct timerqueue_head *head, @@ -135,10 +138,12 @@ struct posix_cputimers { /** * posix_cputimers_work - Container for task work based posix CPU timer expiry * @work: The task work to be scheduled + * @mutex: Mutex held around expiry in context of this task work * @scheduled: @work has been scheduled already, no further processing */ struct posix_cputimers_work { struct callback_head work; + struct mutex mutex; unsigned int scheduled; }; diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c index 2f5e9b3..e9c6f9d 100644 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -846,6 +846,8 @@ static u64 collect_timerqueue(struct timerqueue_head *head, return expires; ctmr->firing = 1; + /* See posix_cpu_timer_wait_running() */ + rcu_assign_pointer(ctmr->handling, current); cpu_timer_dequeue(ctmr); list_add_tail(&ctmr->elist, firing); } @@ -1161,7 +1163,49 @@ static void handle_posix_cpu_timers(struct task_struct *tsk); #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK static void posix_cpu_timers_work(struct callback_head *work) { + struct posix_cputimers_work *cw = container_of(work, typeof(*cw), work); + + mutex_lock(&cw->mutex); handle_posix_cpu_timers(current); + mutex_unlock(&cw->mutex); +} + +/* + * Invoked from the posix-timer core when a cancel operation failed because + * the timer is marked firing. The caller holds rcu_read_lock(), which + * protects the timer and the task which is expiring it from being freed. + */ +static void posix_cpu_timer_wait_running(struct k_itimer *timr) +{ + struct task_struct *tsk = rcu_dereference(timr->it.cpu.handling); + + /* Has the handling task completed expiry already? */ + if (!tsk) + return; + + /* Ensure that the task cannot go away */ + get_task_struct(tsk); + /* Now drop the RCU protection so the mutex can be locked */ + rcu_read_unlock(); + /* Wait on the expiry mutex */ + mutex_lock(&tsk->posix_cputimers_work.mutex); + /* Release it immediately again. */ + mutex_unlock(&tsk->posix_cputimers_work.mutex); + /* Drop the task reference. */ + put_task_struct(tsk); + /* Relock RCU so the callsite is balanced */ + rcu_read_lock(); +} + +static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr) +{ + /* Ensure that timr->it.cpu.handling task cannot go away */ + rcu_read_lock(); + spin_unlock_irq(&timr->it_lock); + posix_cpu_timer_wait_running(timr); + rcu_read_unlock(); + /* @timr is on stack and is valid */ + spin_lock_irq(&timr->it_lock); } /* @@ -1177,6 +1221,7 @@ void clear_posix_cputimers_work(struct task_struct *p) sizeof(p->posix_cputimers_work.work)); init_task_work(&p->posix_cputimers_work.work, posix_cpu_timers_work); + mutex_init(&p->posix_cputimers_work.mutex); p->posix_cputimers_work.scheduled = false; } @@ -1255,6 +1300,18 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk) lockdep_posixtimer_exit(); } +static void posix_cpu_timer_wait_running(struct k_itimer *timr) +{ + cpu_relax(); +} + +static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr) +{ + spin_unlock_irq(&timr->it_lock); + cpu_relax(); + spin_lock_irq(&timr->it_lock); +} + static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk) { return false; @@ -1363,6 +1420,8 @@ static void handle_posix_cpu_timers(struct task_struct *tsk) */ if (likely(cpu_firing >= 0)) cpu_timer_fire(timer); + /* See posix_cpu_timer_wait_running() */ + rcu_assign_pointer(timer->it.cpu.handling, NULL); spin_unlock(&timer->it_lock); } } @@ -1497,23 +1556,16 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags, expires = cpu_timer_getexpires(&timer.it.cpu); error = posix_cpu_timer_set(&timer, 0, &zero_it, &it); if (!error) { - /* - * Timer is now unarmed, deletion can not fail. - */ + /* Timer is now unarmed, deletion can not fail. */ posix_cpu_timer_del(&timer); + } else { + while (error == TIMER_RETRY) { + posix_cpu_timer_wait_running_nsleep(&timer); + error = posix_cpu_timer_del(&timer); + } } - spin_unlock_irq(&timer.it_lock); - while (error == TIMER_RETRY) { - /* - * We need to handle case when timer was or is in the - * middle of firing. In other cases we already freed - * resources. - */ - spin_lock_irq(&timer.it_lock); - error = posix_cpu_timer_del(&timer); - spin_unlock_irq(&timer.it_lock); - } + spin_unlock_irq(&timer.it_lock); if ((it.it_value.tv_sec | it.it_value.tv_nsec) == 0) { /* @@ -1623,6 +1675,7 @@ const struct k_clock clock_posix_cpu = { .timer_del = posix_cpu_timer_del, .timer_get = posix_cpu_timer_get, .timer_rearm = posix_cpu_timer_rearm, + .timer_wait_running = posix_cpu_timer_wait_running, }; const struct k_clock clock_process = { diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 0c8a87a..808a247 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -846,6 +846,10 @@ static struct k_itimer *timer_wait_running(struct k_itimer *timer, rcu_read_lock(); unlock_timer(timer, *flags); + /* + * kc->timer_wait_running() might drop RCU lock. So @timer + * cannot be touched anymore after the function returns! + */ if (!WARN_ON_ONCE(!kc->timer_wait_running)) kc->timer_wait_running(timer);

2 years, 1 month

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2023