Hi Greg, Sasha,
Recently, 2 patches related to MPTCP have not been backported to v6.1
tree due to conflicts:
- 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") [1]
- 63740448a32e ("mptcp: fix accept vs worker race") [2]
I then here resolved the conflicts, documented what I did in each patch
and ran our tests suite. Everything seems OK.
These patches are based on top of the latest linux-stable-rc/linux-6.1.y
version.
Do you mind adding these two patches to v6.1 queue please?
[1] https://lore.kernel.org/r/2023042259-gravity-hate-a9a3@gregkh
[2] https://lore.kernel.org/r/2023042215-chastise-scuba-8478@gregkh
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Paolo Abeni (2):
mptcp: stops worker on unaccepted sockets at listener close
mptcp: fix accept vs worker race
net/mptcp/protocol.c | 74 +++++++++++++++++++++++++++++++++---------------
net/mptcp/protocol.h | 2 ++
net/mptcp/subflow.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 130 insertions(+), 26 deletions(-)
---
base-commit: e4ff6ff54dea67f94036a357201b0f9807405cc6
change-id: 20230424-upstream-stable-20230424-conflicts-6-1-f325fe76c540
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x f4e9e0e69468583c2c6d9d5c7bfc975e292bf188
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042253-speed-jolliness-682f@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator")
9760ebffbf55 ("mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator")
47d9644de92c ("nommu: convert nommu to using the vma iterator")
a27a11f92fe2 ("mm/mremap: use vmi version of vma_merge()")
076f16bf7698 ("mmap: use vmi version of vma_merge()")
0c0c5bffd0a2 ("mmap: pass through vmi iterator to __split_vma()")
178e22ac2078 ("madvise: use vmi iterator for __split_vma() and vma_merge()")
f10c2abcdac4 ("mempolicy: convert to vma iterator")
37598f5a9d8b ("mlock: convert mlock to vma iterator")
2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
11a9b90274f6 ("userfaultfd: use vma iterator")
f2ebfe43ba6c ("mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
5ab0fc155dc0 ("Sync mm-stable with mm-hotfixes-stable to pick up dependent patches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4e9e0e69468583c2c6d9d5c7bfc975e292bf188 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 10 Apr 2023 11:22:05 -0400
Subject: [PATCH] mm/mempolicy: fix use-after-free of VMA iterator
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
Tested-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a256a241fd1d..2068b594dc88 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -790,61 +790,50 @@ static int vma_replace_policy(struct vm_area_struct *vma,
return err;
}
-/* Step 2: apply policy to a range and do splits. */
-static int mbind_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, struct mempolicy *new_pol)
+/* Split or merge the VMA (if required) and apply the new policy */
+static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, struct mempolicy *new_pol)
{
- VMA_ITERATOR(vmi, mm, start);
- struct vm_area_struct *prev;
- struct vm_area_struct *vma;
- int err = 0;
+ struct vm_area_struct *merged;
+ unsigned long vmstart, vmend;
pgoff_t pgoff;
+ int err;
- prev = vma_prev(&vmi);
- vma = vma_find(&vmi, end);
- if (WARN_ON(!vma))
+ vmend = min(end, vma->vm_end);
+ if (start > vma->vm_start) {
+ *prev = vma;
+ vmstart = start;
+ } else {
+ vmstart = vma->vm_start;
+ }
+
+ if (mpol_equal(vma_policy(vma), new_pol))
return 0;
- if (start > vma->vm_start)
- prev = vma;
-
- do {
- unsigned long vmstart = max(start, vma->vm_start);
- unsigned long vmend = min(end, vma->vm_end);
-
- if (mpol_equal(vma_policy(vma), new_pol))
- goto next;
-
- pgoff = vma->vm_pgoff +
- ((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
- if (prev) {
- vma = prev;
- goto replace;
- }
- if (vma->vm_start != vmstart) {
- err = split_vma(&vmi, vma, vmstart, 1);
- if (err)
- goto out;
- }
- if (vma->vm_end != vmend) {
- err = split_vma(&vmi, vma, vmend, 0);
- if (err)
- goto out;
- }
-replace:
- err = vma_replace_policy(vma, new_pol);
+ pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
+ merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, new_pol,
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ if (merged) {
+ *prev = merged;
+ return vma_replace_policy(merged, new_pol);
+ }
+
+ if (vma->vm_start != vmstart) {
+ err = split_vma(vmi, vma, vmstart, 1);
if (err)
- goto out;
-next:
- prev = vma;
- } for_each_vma_range(vmi, vma, end);
+ return err;
+ }
-out:
- return err;
+ if (vma->vm_end != vmend) {
+ err = split_vma(vmi, vma, vmend, 0);
+ if (err)
+ return err;
+ }
+
+ *prev = vma;
+ return vma_replace_policy(vma, new_pol);
}
/* Set the process memory policy */
@@ -1259,6 +1248,8 @@ static long do_mbind(unsigned long start, unsigned long len,
nodemask_t *nmask, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma, *prev;
+ struct vma_iterator vmi;
struct mempolicy *new;
unsigned long end;
int err;
@@ -1328,7 +1319,13 @@ static long do_mbind(unsigned long start, unsigned long len,
goto up_out;
}
- err = mbind_range(mm, start, end, new);
+ vma_iter_init(&vmi, mm, start);
+ prev = vma_prev(&vmi);
+ for_each_vma_range(vmi, vma, end) {
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
+ if (err)
+ break;
+ }
if (!err) {
int nr_failed = 0;
@@ -1489,10 +1486,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
unsigned long, home_node, unsigned long, flags)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma, *prev;
struct mempolicy *new, *old;
- unsigned long vmstart;
- unsigned long vmend;
unsigned long end;
int err = -ENOENT;
VMA_ITERATOR(vmi, mm, start);
@@ -1521,6 +1516,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
if (end == start)
return 0;
mmap_write_lock(mm);
+ prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
/*
* If any vma in the range got policy other than MPOL_BIND
@@ -1541,9 +1537,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
}
new->home_node = home_node;
- vmstart = max(start, vma->vm_start);
- vmend = min(end, vma->vm_end);
- err = mbind_range(mm, vmstart, vmend, new);
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
mpol_put(new);
if (err)
break;
From: David Matlack <dmatlack(a)google.com>
[ Upstream commit 13ec9308a85702af7c31f3638a2720863848a7f2 ]
Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can
detect if the results of vma_lookup() (e.g. vma_shift) become stale
before it acquires kvm->mmu_lock. This fixes a theoretical bug where a
VMA could be changed by userspace after vma_lookup() and before KVM
reads the mmu_invalidate_seq, causing KVM to install page table entries
based on a (possibly) no-longer-valid vma_shift.
Re-order the MMU cache top-up to earlier in user_mem_abort() so that it
is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid
inducing spurious fault retries).
This bug has existed since KVM/ARM's inception. It's unlikely that any
sane userspace currently modifies VMAs in such a way as to trigger this
race. And even with directed testing I was unable to reproduce it. But a
sufficiently motivated host userspace might be able to exploit this
race.
Fixes: 94f8e6418d39 ("KVM: ARM: Handle guest faults in KVM")
Cc: stable(a)vger.kernel.org # 6.1 only
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: David Matlack <dmatlack(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20230313235454.2964067-1-dmatlack@google.com
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
[will: Use FSC_PERM instead of ESR_ELx_FSC_PERM]
Signed-off-by: Will Deacon <will(a)kernel.org>
---
arch/arm64/kvm/mmu.c | 47 ++++++++++++++++++++------------------------
1 file changed, 21 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 019472dd98ff..54ccdcc2dbdf 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1178,6 +1178,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}
+ /*
+ * Permission faults just need to update the existing leaf entry,
+ * and so normally don't require allocations from the memcache. The
+ * only exception to this is when dirty logging is enabled at runtime
+ * and a write fault needs to collapse a block entry into a table.
+ */
+ if (fault_status != FSC_PERM ||
+ (logging_active && write_fault)) {
+ ret = kvm_mmu_topup_memory_cache(memcache,
+ kvm_mmu_cache_min_pages(kvm));
+ if (ret)
+ return ret;
+ }
+
/*
* Let's check if we will get back a huge page backed by hugetlbfs, or
* get block mapping for device MMIO region.
@@ -1234,36 +1248,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
fault_ipa &= ~(vma_pagesize - 1);
gfn = fault_ipa >> PAGE_SHIFT;
- mmap_read_unlock(current->mm);
-
- /*
- * Permission faults just need to update the existing leaf entry,
- * and so normally don't require allocations from the memcache. The
- * only exception to this is when dirty logging is enabled at runtime
- * and a write fault needs to collapse a block entry into a table.
- */
- if (fault_status != FSC_PERM || (logging_active && write_fault)) {
- ret = kvm_mmu_topup_memory_cache(memcache,
- kvm_mmu_cache_min_pages(kvm));
- if (ret)
- return ret;
- }
- mmu_seq = vcpu->kvm->mmu_invalidate_seq;
/*
- * Ensure the read of mmu_invalidate_seq happens before we call
- * gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk
- * the page we just got a reference to gets unmapped before we have a
- * chance to grab the mmu_lock, which ensure that if the page gets
- * unmapped afterwards, the call to kvm_unmap_gfn will take it away
- * from us again properly. This smp_rmb() interacts with the smp_wmb()
- * in kvm_mmu_notifier_invalidate_<page|range_end>.
+ * Read mmu_invalidate_seq so that KVM can detect if the results of
+ * vma_lookup() or __gfn_to_pfn_memslot() become stale prior to
+ * acquiring kvm->mmu_lock.
*
- * Besides, __gfn_to_pfn_memslot() instead of gfn_to_pfn_prot() is
- * used to avoid unnecessary overhead introduced to locate the memory
- * slot because it's always fixed even @gfn is adjusted for huge pages.
+ * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
+ * with the smp_wmb() in kvm_mmu_invalidate_end().
*/
- smp_rmb();
+ mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+ mmap_read_unlock(current->mm);
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
write_fault, &writable, NULL);
--
2.40.0.634.g4ca3ef3211-goog
From: Ziwei Dai <ziwei.dai(a)unisoc.com>
commit 5da7cb193db32da783a3f3e77d8b639989321d48 upstream.
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode. The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.
On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories. Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed. And
the kfree_rcu_monitor() function does in fact check for this.
Except that the kfree_rcu_monitor() function checks these pointers one
at a time. This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately. This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.
To see this, consider the following sequence of events, in which:
o Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
then is preempted.
o CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
after a later grace period. Except that "from_cset" is freed
right after the previous grace period ended, so that "from_cset"
is immediately freed. Task A resumes and references "from_cset"'s
member, after which nothing good happens.
In full detail:
CPU 0 CPU 1
---------------------- ----------------------
count_memcg_event_mm()
|rcu_read_lock() <---
|mem_cgroup_from_task()
|// css_set_ptr is the "from_cset" mentioned on CPU 1
|css_set_ptr = rcu_dereference((task)->cgroups)
|// Hard irq comes, current task is scheduled out.
cgroup_attach_task()
|cgroup_migrate()
|cgroup_migrate_execute()
|css_set_move_task(task, from_cset, to_cset, true)
|cgroup_move_task(task, to_cset)
|rcu_assign_pointer(.., to_cset)
|...
|cgroup_migrate_finish()
|put_css_set_locked(from_cset)
|from_cset->refcount return 0
|kfree_rcu(cset, rcu_head) // free from_cset after new gp
|add_ptr_to_bulk_krc_lock()
|schedule_delayed_work(&krcp->monitor_work, ..)
kfree_rcu_monitor()
|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
|queue_rcu_work(system_wq, &krwp->rcu_work)
|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp
// There is a perious call_rcu(.., rcu_work_rcufn)
// gp end, rcu_work_rcufn() is called.
rcu_work_rcufn()
|__queue_work(.., rwork->wq, &rwork->work);
|kfree_rcu_work()
|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
|The "from_cset" is freed before new gp end.
// the task resumes some time later.
|css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.
This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.
v2: Use helper function instead of inserted code block at kfree_rcu_monitor().
[UR: backport to 6.2-stable]
Fixes: 34c881745549 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d620447 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai(a)unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Tested-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
---
kernel/rcu/tree.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cf34a961821a..522129193771 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3131,6 +3131,18 @@ need_offload_krc(struct kfree_rcu_cpu *krcp)
return !!krcp->head;
}
+static bool
+need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp)
+{
+ int i;
+
+ for (i = 0; i < FREE_N_CHANNELS; i++)
+ if (krwp->bkvhead_free[i])
+ return true;
+
+ return !!krwp->head_free;
+}
+
static void
schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
{
@@ -3162,14 +3174,13 @@ static void kfree_rcu_monitor(struct work_struct *work)
for (i = 0; i < KFREE_N_BATCHES; i++) {
struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]);
- // Try to detach bkvhead or head and attach it over any
- // available corresponding free channel. It can be that
- // a previous RCU batch is in progress, it means that
- // immediately to queue another one is not possible so
- // in that case the monitor work is rearmed.
- if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
- (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
- (krcp->head && !krwp->head_free)) {
+ // Try to detach bulk_head or head and attach it, only when
+ // all channels are free. Any channel is not free means at krwp
+ // there is on-going rcu work to handle krwp's free business.
+ if (need_wait_for_krwp_work(krwp))
+ continue;
+
+ if (need_offload_krc(krcp)) {
// Channel 1 corresponds to the SLAB-pointer bulk path.
// Channel 2 corresponds to vmalloc-pointer bulk path.
for (j = 0; j < FREE_N_CHANNELS; j++) {
--
2.30.2
From: Tobias Schramm <t.schramm(a)manjaro.org>
[ Upstream commit eca5bd666b0aa7dc0bca63292e4778968241134e ]
This commit fixes a race between completion of stop command and start of a
new command.
Previously the command ready interrupt was enabled before stop command
was written to the command register. This caused the command ready
interrupt to fire immediately since the CMDRDY flag is asserted constantly
while there is no command in progress.
Consequently the command state machine will immediately advance to the
next state when the tasklet function is executed again, no matter
actual completion state of the stop command.
Thus a new command can then be dispatched immediately, interrupting and
corrupting the stop command on the CMD line.
Fix that by dropping the command ready interrupt enable before calling
atmci_send_stop_cmd. atmci_send_stop_cmd does already enable the
command ready interrupt, no further writes to ATMCI_IER are necessary.
Signed-off-by: Tobias Schramm <t.schramm(a)manjaro.org>
Acked-by: Ludovic Desroches <ludovic.desroches(a)microchip.com>
Link: https://lore.kernel.org/r/20221230194315.809903-2-t.schramm@manjaro.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/mmc/host/atmel-mci.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/mmc/host/atmel-mci.c b/drivers/mmc/host/atmel-mci.c
index bb9bbf1c927b6..dd18440a90c58 100644
--- a/drivers/mmc/host/atmel-mci.c
+++ b/drivers/mmc/host/atmel-mci.c
@@ -1817,7 +1817,6 @@ static void atmci_tasklet_func(struct tasklet_struct *t)
atmci_writel(host, ATMCI_IER, ATMCI_NOTBUSY);
state = STATE_WAITING_NOTBUSY;
} else if (host->mrq->stop) {
- atmci_writel(host, ATMCI_IER, ATMCI_CMDRDY);
atmci_send_stop_cmd(host, data);
state = STATE_SENDING_STOP;
} else {
@@ -1850,8 +1849,6 @@ static void atmci_tasklet_func(struct tasklet_struct *t)
* command to send.
*/
if (host->mrq->stop) {
- atmci_writel(host, ATMCI_IER,
- ATMCI_CMDRDY);
atmci_send_stop_cmd(host, data);
state = STATE_SENDING_STOP;
} else {
--
2.39.2
Hi Greg and Sasha,
On Tue, 10 Aug 2021 16:45:34 +0000 SeongJae Park <sj38.park(a)gmail.com> wrote:
> From: SeongJae Park <sjpark(a)amazon.de>
>
> When running a test program, 'run_one()' checks if the program has the
> execution permission and fails if it doesn't. However, it's easy to
> mistakenly missing the permission, as some common tools like 'diff'
> don't support the permission change well[1]. Compared to that, making
> mistakes in the test program's path would only rare, as those are
> explicitly listed in 'TEST_PROGS'. Therefore, it might make more sense
> to resolve the situation on our own and run the program.
>
> For the reason, this commit makes the test program runner function to
> still print the warning message but try parsing the interpreter of the
> program and explicitly run it with the interpreter, in the case.
>
> [1] https://lore.kernel.org/mm-commits/YRJisBs9AunccCD4@kroah.com/
>
> Suggested-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
This patch has merged into the mainline by the commit 303f8e2d0200
("selftests/kselftest/runner/run_one(): allow running non-executable files").
However, this patch has not added to v5.15.y, while there are some selftests
having no execution permission, including that for DAMON. As a result, the
selftests always fail unless this patch is manually applied. Could you please
add this patch to v5.15.y? I confirmed this patch can cleanly cherry-picked on
the latest v5.15.y.
Thanks,
SJ
[ Upstream commit 59f5ede3bc0f00eb856425f636dab0c10feb06d8 ]
The FPU usage related to task FPU management is either protected by
disabling interrupts (switch_to, return to user) or via fpregs_lock() which
is a wrapper around local_bh_disable(). When kernel code wants to use the
FPU then it has to check whether it is possible by calling irq_fpu_usable().
But the condition in irq_fpu_usable() is wrong. It allows FPU to be used
when:
!in_interrupt() || interrupted_user_mode() || interrupted_kernel_fpu_idle()
The latter is checking whether some other context already uses FPU in the
kernel, but if that's not the case then it allows FPU to be used
unconditionally even if the calling context interrupted a fpregs_lock()
critical region. If that happens then the FPU state of the interrupted
context becomes corrupted.
Allow in kernel FPU usage only when no other context has in kernel FPU
usage and either the calling context is not hard interrupt context or the
hard interrupt did not interrupt a local bottomhalf disabled region.
It's hard to find a proper Fixes tag as the condition was broken in one way
or the other for a very long time and the eager/lazy FPU changes caused a
lot of churn. Picked something remotely connected from the history.
This survived undetected for quite some time as FPU usage in interrupt
context is rare, but the recent changes to the random code unearthed it at
least on a kernel which had FPU debugging enabled. There is probably a
higher rate of silent corruption as not all issues can be detected by the
FPU debugging code. This will be addressed in a subsequent change.
Fixes: 5d2bd7009f30 ("x86, fpu: decouple non-lazy/eager fpu restore from xsave")
Reported-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org
Cc: Can Sun <cansun(a)arista.com>
Link: https://lore.kernel.org/r/20220501193102.588689270@linutronix.de
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 571220ac8bea..835b948095cd 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -25,17 +25,7 @@
*/
union fpregs_state init_fpstate __read_mostly;
-/*
- * Track whether the kernel is using the FPU state
- * currently.
- *
- * This flag is used:
- *
- * - by IRQ context code to potentially use the FPU
- * if it's unused.
- *
- * - to debug kernel_fpu_begin()/end() correctness
- */
+/* Track in-kernel FPU usage */
static DEFINE_PER_CPU(bool, in_kernel_fpu);
/*
@@ -43,42 +33,37 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
*/
DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
-static bool kernel_fpu_disabled(void)
-{
- return this_cpu_read(in_kernel_fpu);
-}
-
-static bool interrupted_kernel_fpu_idle(void)
-{
- return !kernel_fpu_disabled();
-}
-
-/*
- * Were we in user mode (or vm86 mode) when we were
- * interrupted?
- *
- * Doing kernel_fpu_begin/end() is ok if we are running
- * in an interrupt context from user mode - we'll just
- * save the FPU state as required.
- */
-static bool interrupted_user_mode(void)
-{
- struct pt_regs *regs = get_irq_regs();
- return regs && user_mode(regs);
-}
-
/*
* Can we use the FPU in kernel mode with the
* whole "kernel_fpu_begin/end()" sequence?
- *
- * It's always ok in process context (ie "not interrupt")
- * but it is sometimes ok even from an irq.
*/
bool irq_fpu_usable(void)
{
- return !in_interrupt() ||
- interrupted_user_mode() ||
- interrupted_kernel_fpu_idle();
+ if (WARN_ON_ONCE(in_nmi()))
+ return false;
+
+ /* In kernel FPU usage already active? */
+ if (this_cpu_read(in_kernel_fpu))
+ return false;
+
+ /*
+ * When not in NMI or hard interrupt context, FPU can be used in:
+ *
+ * - Task context except from within fpregs_lock()'ed critical
+ * regions.
+ *
+ * - Soft interrupt processing context which cannot happen
+ * while in a fpregs_lock()'ed critical region.
+ */
+ if (!in_irq())
+ return true;
+
+ /*
+ * In hard interrupt context it's safe when soft interrupts
+ * are enabled, which means the interrupt did not hit in
+ * a fpregs_lock()'ed critical region.
+ */
+ return !softirq_count();
}
EXPORT_SYMBOL(irq_fpu_usable);