On Tue, Aug 23, 2022 at 07:20:14AM -0500, Bjorn Helgaas wrote:
> On Tue, Aug 23, 2022, 6:35 AM Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> wrote:
>
> > From: Stefan Roese <sr(a)denx.de>
> >
> > [ Upstream commit 8795e182b02dc87e343c79e73af6b8b7f9c5e635 ]
> >
>
> There's an open regression related to this commit:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216373
This is already in the following released stable kernels:
5.10.137 5.15.61 5.18.18 5.19.2
I'll go drop it from the 4.19 and 5.4 queues, but when this gets
resolved in Linus's tree, make sure there's a cc: stable on the fix so
that we know to backport it to the above branches as well. Or at the
least, a "Fixes:" tag.
thanks,
greg k-h
The patch titled
Subject: mm/shmem: Fix race in shmem_undo_range w/THP
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-shmem-fix-race-in-shmem_undo_range-w-thp.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Stevens <stevensd(a)chromium.org>
Subject: mm/shmem: Fix race in shmem_undo_range w/THP
Date: Tue, 18 Apr 2023 17:40:31 +0900
Split folios during the second loop of shmem_undo_range. It's not
sufficient to only split folios when dealing with partial pages, since
it's possible for a THP to be faulted in after that point. Calling
truncate_inode_folio in that situation can result in throwing away data
outside of the range being targeted.
Link: https://lkml.kernel.org/r/20230418084031.3439795-1-stevensd@google.com
Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
Signed-off-by: David Stevens <stevensd(a)chromium.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Suleiman Souhlal <suleiman(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
--- a/mm/shmem.c~mm-shmem-fix-race-in-shmem_undo_range-w-thp
+++ a/mm/shmem.c
@@ -1022,7 +1022,22 @@ whole_folios:
}
VM_BUG_ON_FOLIO(folio_test_writeback(folio),
folio);
- truncate_inode_folio(mapping, folio);
+
+ if (!folio_test_large(folio)) {
+ truncate_inode_folio(mapping, folio);
+ } else if (truncate_inode_partial_folio(folio, lstart, lend)) {
+ /*
+ * If we split a page, reset the loop so that we
+ * pick up the new sub pages. Otherwise the THP
+ * was entirely dropped or the target range was
+ * zeroed, so just continue the loop as is.
+ */
+ if (!folio_test_large(folio)) {
+ folio_unlock(folio);
+ index = start;
+ break;
+ }
+ }
}
folio_unlock(folio);
}
_
Patches currently in -mm which might be from stevensd(a)chromium.org are
mm-shmem-fix-race-in-shmem_undo_range-w-thp.patch
mm-khugepaged-drain-lru-after-swapping-in-shmem.patch
mm-khugepaged-refactor-collapse_file-control-flow.patch
mm-khugepaged-skip-shmem-with-userfaultfd.patch
mm-khugepaged-maintain-page-cache-uptodate-flag.patch
The patch titled
Subject: kasan: hw_tags: avoid invalid virt_to_page()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kasan-hw_tags-avoid-invalid-virt_to_page.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mark Rutland <mark.rutland(a)arm.com>
Subject: kasan: hw_tags: avoid invalid virt_to_page()
Date: Tue, 18 Apr 2023 17:42:12 +0100
When booting with 'kasan.vmalloc=off', a kernel configured with support
for KASAN_HW_TAGS will explode at boot time due to bogus use of
virt_to_page() on a vmalloc adddress. With CONFIG_DEBUG_VIRTUAL selected
this will be reported explicitly, and with or without CONFIG_DEBUG_VIRTUAL
the kernel will dereference a bogus address:
| ------------[ cut here ]------------
| virt_to_phys used for non-linear address: (____ptrval____) (0xffff800008000000)
| WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x78/0x80
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc3-00073-g83865133300d-dirty #4
| Hardware name: linux,dummy-virt (DT)
| pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __virt_to_phys+0x78/0x80
| lr : __virt_to_phys+0x78/0x80
| sp : ffffcd076afd3c80
| x29: ffffcd076afd3c80 x28: 0068000000000f07 x27: ffff800008000000
| x26: fffffbfff0000000 x25: fffffbffff000000 x24: ff00000000000000
| x23: ffffcd076ad3c000 x22: fffffc0000000000 x21: ffff800008000000
| x20: ffff800008004000 x19: ffff800008000000 x18: ffff800008004000
| x17: 666678302820295f x16: ffffffffffffffff x15: 0000000000000004
| x14: ffffcd076b009e88 x13: 0000000000000fff x12: 0000000000000003
| x11: 00000000ffffefff x10: c0000000ffffefff x9 : 0000000000000000
| x8 : 0000000000000000 x7 : 205d303030303030 x6 : 302e30202020205b
| x5 : ffffcd076b41d63f x4 : ffffcd076afd3827 x3 : 0000000000000000
| x2 : 0000000000000000 x1 : ffffcd076afd3a30 x0 : 000000000000004f
| Call trace:
| __virt_to_phys+0x78/0x80
| __kasan_unpoison_vmalloc+0xd4/0x478
| __vmalloc_node_range+0x77c/0x7b8
| __vmalloc_node+0x54/0x64
| init_IRQ+0x94/0xc8
| start_kernel+0x194/0x420
| __primary_switched+0xbc/0xc4
| ---[ end trace 0000000000000000 ]---
| Unable to handle kernel paging request at virtual address 03fffacbe27b8000
| Mem abort info:
| ESR = 0x0000000096000004
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| FSC = 0x04: level 0 translation fault
| Data abort info:
| ISV = 0, ISS = 0x00000004
| CM = 0, WnR = 0
| swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041bc5000
| [03fffacbe27b8000] pgd=0000000000000000, p4d=0000000000000000
| Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.3.0-rc3-00073-g83865133300d-dirty #4
| Hardware name: linux,dummy-virt (DT)
| pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __kasan_unpoison_vmalloc+0xe4/0x478
| lr : __kasan_unpoison_vmalloc+0xd4/0x478
| sp : ffffcd076afd3ca0
| x29: ffffcd076afd3ca0 x28: 0068000000000f07 x27: ffff800008000000
| x26: 0000000000000000 x25: 03fffacbe27b8000 x24: ff00000000000000
| x23: ffffcd076ad3c000 x22: fffffc0000000000 x21: ffff800008000000
| x20: ffff800008004000 x19: ffff800008000000 x18: ffff800008004000
| x17: 666678302820295f x16: ffffffffffffffff x15: 0000000000000004
| x14: ffffcd076b009e88 x13: 0000000000000fff x12: 0000000000000001
| x11: 0000800008000000 x10: ffff800008000000 x9 : ffffb2f8dee00000
| x8 : 000ffffb2f8dee00 x7 : 205d303030303030 x6 : 302e30202020205b
| x5 : ffffcd076b41d63f x4 : ffffcd076afd3827 x3 : 0000000000000000
| x2 : 0000000000000000 x1 : ffffcd076afd3a30 x0 : ffffb2f8dee00000
| Call trace:
| __kasan_unpoison_vmalloc+0xe4/0x478
| __vmalloc_node_range+0x77c/0x7b8
| __vmalloc_node+0x54/0x64
| init_IRQ+0x94/0xc8
| start_kernel+0x194/0x420
| __primary_switched+0xbc/0xc4
| Code: d34cfc08 aa1f03fa 8b081b39 d503201f (f9400328)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill the idle task!
This is because init_vmalloc_pages() erroneously calls virt_to_page() on
a vmalloc address, while virt_to_page() is only valid for addresses in
the linear/direct map. Since init_vmalloc_pages() expects virtual
addresses in the vmalloc range, it must use vmalloc_to_page() rather
than virt_to_page().
We call init_vmalloc_pages() from __kasan_unpoison_vmalloc(), where we
check !is_vmalloc_or_module_addr(), suggesting that we might encounter a
non-vmalloc address. Luckily, this never happens. By design, we only
call __kasan_unpoison_vmalloc() on pointers in the vmalloc area, and I
have verified that we don't violate that expectation. Given that,
is_vmalloc_or_module_addr() must always be true for any legitimate
argument to __kasan_unpoison_vmalloc().
Correct init_vmalloc_pages() to use vmalloc_to_page(), and remove the
redundant and misleading use of is_vmalloc_or_module_addr() in
__kasan_unpoison_vmalloc().
Link: https://lkml.kernel.org/r/20230418164212.1775741-1-mark.rutland@arm.com
Fixes: 6c2f761dad7851d8 ("kasan: fix zeroing vmalloc memory with HW_TAGS")
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/hw_tags.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/kasan/hw_tags.c~kasan-hw_tags-avoid-invalid-virt_to_page
+++ a/mm/kasan/hw_tags.c
@@ -285,7 +285,7 @@ static void init_vmalloc_pages(const voi
const void *addr;
for (addr = start; addr < start + size; addr += PAGE_SIZE) {
- struct page *page = virt_to_page(addr);
+ struct page *page = vmalloc_to_page(addr);
clear_highpage_kasan_tagged(page);
}
@@ -297,7 +297,7 @@ void *__kasan_unpoison_vmalloc(const voi
u8 tag;
unsigned long redzone_start, redzone_size;
- if (!kasan_vmalloc_enabled() || !is_vmalloc_or_module_addr(start)) {
+ if (!kasan_vmalloc_enabled()) {
if (flags & KASAN_VMALLOC_INIT)
init_vmalloc_pages(start, size);
return (void *)start;
_
Patches currently in -mm which might be from mark.rutland(a)arm.com are
kasan-hw_tags-avoid-invalid-virt_to_page.patch
Hi all,
This series is a backport of upstream commit e89c2e815e76 ("riscv:
Handle zicsr/zifencei issues between clang and binutils") to
linux-5.10.y, with the necessary machinery for CONFIG_AS_IS_GNU and
CONFIG_AS_VERSION, which that commit requires.
While the middle two patches are not strictly necessary, they are good
clean ups that ensure consistency with mainline. The first three changes
are already present in 5.15, so there is no risk of a regression moving
forward.
If there are any issues, please let me know.
NOTE: I am sending this series with 'b4 send', as that is what I am used
to at this point. Please accept my apologies if this causes any issues.
---
Masahiro Yamada (2):
kbuild: check the minimum assembler version in Kconfig
kbuild: check CONFIG_AS_IS_LLVM instead of LLVM_IAS
Nathan Chancellor (2):
kbuild: Switch to 'f' variants of integrated assembler flag
riscv: Handle zicsr/zifencei issues between clang and binutils
Makefile | 8 +++---
arch/riscv/Kconfig | 22 ++++++++++++++++
arch/riscv/Makefile | 12 +++++----
init/Kconfig | 12 +++++++++
scripts/Kconfig.include | 6 +++++
scripts/as-version.sh | 69 +++++++++++++++++++++++++++++++++++++++++++++++++
scripts/dummy-tools/gcc | 6 +++++
7 files changed, 127 insertions(+), 8 deletions(-)
---
base-commit: ca9787bdecfa2174b0a169a54916e22b89b0ef5b
change-id: 20230328-riscv-zifencei-zicsr-5-10-65596f2cac9e
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
On Mon, Apr 17, 2023 at 09:27:22PM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> purgatory: fix disabling debug info
>
> to the 5.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> purgatory-fix-disabling-debug-info.patch
> and it can be found in the queue-5.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
There's no need for this patch on 5.15, as the regression it fixes was
only introduced in 6.0. It won't do any harm though — is it considered
good practice to keep the code in sync between stable kernels to make
backports of other patches easier? If so, it would make sense to
backport after all.
> commit 618ea690941689fe28fa9c150f90bb096db5f8a5
> Author: Alyssa Ross <hi(a)alyssa.is>
> Date: Sun Mar 26 18:21:21 2023 +0000
>
> purgatory: fix disabling debug info
>
> [ Upstream commit d83806c4c0cccc0d6d3c3581a11983a9c186a138 ]
>
> Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
> Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
> -Wa versions of both of those if building with Clang and GNU as. As a
> result, debug info was being generated for the purgatory objects, even
> though the intention was that it not be.
>
> Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files")
> Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
> Cc: stable(a)vger.kernel.org
> Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
> Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..1d6ccd4214d5a 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -64,8 +64,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
> CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
> CFLAGS_string.o += $(PURGATORY_CFLAGS)
>
> -AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
> -AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
> +asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
>
> $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
> $(call if_changed,ld)
When we try to unshare a pinned page for a private hugetlb, uffd-wp bit can
get lost during unsharing. Fix it by carrying it over.
This should be very rare, only if an unsharing happened on a private
hugetlb page with uffd-wp protected (e.g. in a child which shares the same
page with parent with UFFD_FEATURE_EVENT_FORK enabled).
Cc: linux-stable <stable(a)vger.kernel.org>
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Reported-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/hugetlb.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0213efaf31be..cd3a9d8f4b70 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5637,13 +5637,16 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
spin_lock(ptl);
ptep = hugetlb_walk(vma, haddr, huge_page_size(h));
if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare);
+
/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
mmu_notifier_invalidate_range(mm, range.start, range.end);
page_remove_rmap(old_page, vma, true);
hugepage_add_new_anon_rmap(new_folio, vma, haddr);
- set_huge_pte_at(mm, haddr, ptep,
- make_huge_pte(vma, &new_folio->page, !unshare));
+ if (huge_pte_uffd_wp(pte))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(mm, haddr, ptep, newpte);
folio_set_hugetlb_migratable(new_folio);
/* Make the old page be freed below */
new_folio = page_folio(old_page);
--
2.39.1
This is a resend of
https://lore.kernel.org/stable/20230308162207.2886641-1-qyousef@layalina.io/
Which was dropped because of build errors on 5.10 equivalent backport.
I extended the testing to make sure this series is not impacted like 5.10
backport. And update the cover letter to clarify there's no need to take
further backports which removes capacity inversion detection.
Portion of the fixes were ported in 5.15 but missed some.
This ports the remainder of the fixes.
Based on 5.15.98.
a2e90611b9f4 ("sched/fair: Remove capacity inversion detection") is not
necessary to backport because it has a dependency on e5ed0550c04c ("sched/fair:
unlink misfit task from cpu overutilized") which is nice to have but not
strictly required. It improves the search for best CPU under adverse thermal
pressure to try harder. And the new search effectively replaces the capacity
inversion detection, so it is removed afterwards.
Build tested on (cross compile when necessary; x86_64 otherwise):
1. default ubuntu config which has uclamp + smp
2. default ubuntu config without uclamp + smp
3. default ubunto config without smp (which automatically disables
uclamp)
4. reported riscv-allnoconfig, mips-randconfig, x86_64-randocnfigs
Boot tested on android 5.15 GKI with slight modifications due to other
conflicts there. I need more time to be able to do full functional testing on
5.15 - but since some patches were already taken - posting the remainder now.
Sorry due to job/email change I missed the emails when the other backports were
partially taken.
Qais Yousef (7):
sched/uclamp: Fix fits_capacity() check in feec()
sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early
exit condition
sched/fair: Detect capacity inversion
sched/fair: Consider capacity inversion in util_fits_cpu()
sched/uclamp: Fix a uninitialized variable warnings
sched/fair: Fixes for capacity inversion detection
kernel/sched/core.c | 10 ++--
kernel/sched/fair.c | 128 +++++++++++++++++++++++++++++++++++++------
kernel/sched/sched.h | 61 ++++++++++++++++++++-
3 files changed, 174 insertions(+), 25 deletions(-)
--
2.25.1
Portion of the fixes were ported to 6.1 but 3 were missed.
This ports the remainder of the fixes.
Based on 6.1.24.
a2e90611b9f4 ("sched/fair: Remove capacity inversion detection") is not
necessary to backport because it has a dependency on e5ed0550c04c ("sched/fair:
unlink misfit task from cpu overutilized") which is nice to have but not
strictly required. It improves the search for best CPU under adverse thermal
pressure to try harder. And the new search effectively replaces the capacity
inversion detection, so it is removed afterwards.
Build tested on (cross compile when necessary; x86_64 otherwise):
1. default ubuntu config which has uclamp + smp
2. default ubuntu config without uclamp + smp
3. default ubunto config without smp (which automatically disables
uclamp)
4. reported riscv-allnoconfig, mips-randconfig, x86_64-randocnfigs
Boot tested on x86 qemu environment only.
Qais Yousef (3):
sched/fair: Detect capacity inversion
sched/fair: Consider capacity inversion in util_fits_cpu()
sched/fair: Fixes for capacity inversion detection
kernel/sched/fair.c | 86 +++++++++++++++++++++++++++++++++++++++-----
kernel/sched/sched.h | 19 ++++++++++
2 files changed, 97 insertions(+), 8 deletions(-)
--
2.25.1
This is a resend of
https://lore.kernel.org/stable/20230308162207.2886641-1-qyousef@layalina.io/
Which was dropped because of build errors on 5.10 equivalent backport.
I extended the testing to make sure this series is not impacted like 5.10
backport. And update the cover letter to clarify there's no need to take
further backports which removes capacity inversion detection.
Portion of the fixes were ported in 5.15 but missed some.
This ports the remainder of the fixes.
Based on 5.15.98.
a2e90611b9f4 ("sched/fair: Remove capacity inversion detection") is not
necessary to backport because it has a dependency on e5ed0550c04c ("sched/fair:
unlink misfit task from cpu overutilized") which is nice to have but not
strictly required. It improves the search for best CPU under adverse thermal
pressure to try harder. And the new search effectively replaces the capacity
inversion detection, so it is removed afterwards.
Build tested on (cross compile when necessary; x86_64 otherwise):
1. default ubuntu config which has uclamp + smp
2. default ubuntu config without uclamp + smp
3. default ubunto config without smp (which automatically disables
uclamp)
4. reported riscv-allnoconfig, mips-randconfig, x86_64-randocnfigs
Boot tested on android 5.15 GKI with slight modifications due to other
conflicts there. I need more time to be able to do full functional testing on
5.15 - but since some patches were already taken - posting the remainder now.
Sorry due to job/email change I missed the emails when the other backports were
partially taken.
Qais Yousef (7):
sched/uclamp: Fix fits_capacity() check in feec()
sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early
exit condition
sched/fair: Detect capacity inversion
sched/fair: Consider capacity inversion in util_fits_cpu()
sched/uclamp: Fix a uninitialized variable warnings
sched/fair: Fixes for capacity inversion detection
kernel/sched/core.c | 10 ++--
kernel/sched/fair.c | 128 +++++++++++++++++++++++++++++++++++++------
kernel/sched/sched.h | 61 ++++++++++++++++++++-
3 files changed, 174 insertions(+), 25 deletions(-)
--
2.25.1
On Sun, Apr 16, 2023 at 03:22:18PM +0800, cuigaosheng wrote:
> On 2023/4/15 23:07, Greg KH wrote:
> > On Sat, Apr 15, 2023 at 06:11:55PM +0800, Gaosheng Cui wrote:
> > > From: Herbert Xu <herbert(a)gondor.apana.org.au>
> > >
> > > When complex algorithms that depend on other algorithms are built
> > > into the kernel, the order of registration must be done such that
> > > the underlying algorithms are ready before the ones on top are
> > > registered. As otherwise they would fail during the self-test
> > > which is required during registration.
> > >
> > > In the past we have used subsystem initialisation ordering to
> > > guarantee this. The number of such precedence levels are limited
> > > and they may cause ripple effects in other subsystems.
> > >
> > > This patch solves this problem by delaying all self-tests during
> > > boot-up for built-in algorithms. They will be tested either when
> > > something else in the kernel requests for them, or when we have
> > > finished registering all built-in algorithms, whichever comes
> > > earlier.
> > >
> > > Reported-by: Vladis Dronov <vdronov(a)redhat.com>
> > > Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
> > > Signed-off-by: Gaosheng Cui <cuigaosheng1(a)huawei.com>
> > > ---
> > > crypto/algapi.c | 73 +++++++++++++++++++++++++++++++++--------------
> > > crypto/api.c | 52 +++++++++++++++++++++++++++++----
> > > crypto/internal.h | 10 +++++++
> > > 3 files changed, 108 insertions(+), 27 deletions(-)
> > What is the git commit id of this, and the other 3 patches, in Linus's
> > tree? That is required to have here, as you know.
> >
> > thanks,
> >
> > greg k-h
> > .
>
> Thanks for taking time to review these patch.
>
> These patches are in Linus's tree, reference as follows:
> Reference 1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
> Reference 2: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
> Reference 3: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
> Reference 4: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Please resend the patches with the git commit id in the changelog
somewhere, as is normally done (there are thousands of examples on the
mailing list.)
Also be sure that you are also backporting the patches to newer kernel
releases so that someone does not upgrade and have a regression (i.e. if
a patch is also needed in 5.15.y send a backport for that too.)
Thanks,
greg k-h
Changes in v2:
* Fix compilation error against patch 7 due to misiplace #endif to
protect against CONFIG_SMP which doesn't contain the newly added
field to struct rq.
Commit 2ff401441711 ("sched/uclamp: Fix relationship between uclamp and
migration margin") was cherry-picked into 5.10 kernels but missed the rest of
the series.
This ports the remainder of the fixes.
Based on 5.10.172.
NOTE:
a2e90611b9f4 ("sched/fair: Remove capacity inversion detection") is not
necessary to backport because it has a dependency on e5ed0550c04c ("sched/fair:
unlink misfit task from cpu overutilized") which is nice to have but not
strictly required. It improves the search for best CPU under adverse thermal
pressure to try harder. And the new search effectively replaces the capacity
inversion detection, so it is removed afterwards.
Build tested on (cross compile when necessary; x86_64 otherwise):
1. default ubuntu config which has uclamp + smp
2. default ubuntu config without uclamp + smp
3. default ubunto config without smp (which automatically disables
uclamp)
4. reported riscv-allnoconfig, mips-randconfig, x86_64-randocnfigs
Tested on 5.10 Android GKI kernel and android device (with slight modifications
due to other conflicts on there).
Qais Yousef (10):
sched/uclamp: Make task_fits_capacity() use util_fits_cpu()
sched/uclamp: Fix fits_capacity() check in feec()
sched/uclamp: Make select_idle_capacity() use util_fits_cpu()
sched/uclamp: Make asym_fits_capacity() use util_fits_cpu()
sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early
exit condition
sched/fair: Detect capacity inversion
sched/fair: Consider capacity inversion in util_fits_cpu()
sched/uclamp: Fix a uninitialized variable warnings
sched/fair: Fixes for capacity inversion detection
kernel/sched/core.c | 10 +--
kernel/sched/fair.c | 183 ++++++++++++++++++++++++++++++++++---------
kernel/sched/sched.h | 70 ++++++++++++++++-
3 files changed, 217 insertions(+), 46 deletions(-)
--
2.25.1
Hi Greg,
This 5.4.y backport series contains XFS fixes from v5.11 & v5.12. The
patchset has been acked by Darrick.
As a side note, where applicable, patches have been cherry picked from
5.10.y rather than from v5.11/v5.12.
Brian Foster (2):
xfs: consider shutdown in bmapbt cursor delete assert
xfs: don't reuse busy extents on extent trim
Christoph Hellwig (10):
xfs: merge the projid fields in struct xfs_icdinode
xfs: ensure that the inode uid/gid match values match the icdinode
ones
xfs: remove the icdinode di_uid/di_gid members
xfs: remove the kuid/kgid conversion wrappers
xfs: add a new xfs_sb_version_has_v3inode helper
xfs: only check the superblock version for dinode size calculation
xfs: simplify di_flags2 inheritance in xfs_ialloc
xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize
xfs: remove the di_version field from struct icdinode
xfs: fix up non-directory creation in SGID directories
Darrick J. Wong (3):
xfs: report corruption only as a regular error
xfs: shut down the filesystem if we screw up quota reservation
xfs: force log and push AIL to clear pinned inodes when aborting mount
Jeffrey Mitchell (1):
xfs: set inode size after creating symlink
Kaixu Xia (1):
xfs: show the proper user quota options
fs/xfs/libxfs/xfs_attr_leaf.c | 5 +-
fs/xfs/libxfs/xfs_bmap.c | 10 ++--
fs/xfs/libxfs/xfs_btree.c | 30 +++++-------
fs/xfs/libxfs/xfs_format.h | 33 ++++++++++---
fs/xfs/libxfs/xfs_ialloc.c | 6 +--
fs/xfs/libxfs/xfs_inode_buf.c | 54 +++++++-------------
fs/xfs/libxfs/xfs_inode_buf.h | 8 +--
fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
fs/xfs/libxfs/xfs_inode_fork.h | 9 +---
fs/xfs/libxfs/xfs_log_format.h | 10 ++--
fs/xfs/libxfs/xfs_trans_resv.c | 2 +-
fs/xfs/xfs_acl.c | 12 +++--
fs/xfs/xfs_bmap_util.c | 16 +++---
fs/xfs/xfs_buf_item.c | 2 +-
fs/xfs/xfs_dquot.c | 6 +--
fs/xfs/xfs_error.c | 2 +-
fs/xfs/xfs_extent_busy.c | 14 ------
fs/xfs/xfs_icache.c | 8 ++-
fs/xfs/xfs_inode.c | 61 ++++++++---------------
fs/xfs/xfs_inode.h | 21 +-------
fs/xfs/xfs_inode_item.c | 20 ++++----
fs/xfs/xfs_ioctl.c | 22 ++++-----
fs/xfs/xfs_iops.c | 11 +----
fs/xfs/xfs_itable.c | 8 +--
fs/xfs/xfs_linux.h | 32 +++---------
fs/xfs/xfs_log_recover.c | 6 +--
fs/xfs/xfs_mount.c | 90 +++++++++++++++++-----------------
fs/xfs/xfs_qm.c | 43 +++++++++-------
fs/xfs/xfs_qm_bhv.c | 2 +-
fs/xfs/xfs_quota.h | 4 +-
fs/xfs/xfs_super.c | 10 ++--
fs/xfs/xfs_symlink.c | 7 ++-
fs/xfs/xfs_trans_dquot.c | 16 ++++--
33 files changed, 248 insertions(+), 334 deletions(-)
--
2.39.1
Apparently despite it being marked inline, the compiler
may not inline __down_read_common() which makes it difficult
to identify the cause of lock contention, as the blocked
function will always be listed as __down_read_common().
So this patch adds __sched annotation to the function so
the calling function will instead be listed.
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Murray <timmurray(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Waiman Long <longman(a)redhat.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: kernel-team(a)android.com
Cc: stable(a)vger.kernel.org
Fixes: c995e638ccbb ("locking/rwsem: Fold __down_{read,write}*()")
Reported-by: Tim Murray <timmurray(a)google.com>
Signed-off-by: John Stultz <jstultz(a)google.com>
---
kernel/locking/rwsem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index acb5a50309a1..cde2f22e65a8 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1240,7 +1240,7 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
/*
* lock for reading
*/
-static inline int __down_read_common(struct rw_semaphore *sem, int state)
+static inline __sched int __down_read_common(struct rw_semaphore *sem, int state)
{
int ret = 0;
long count;
--
2.40.0.577.gac1e443424-goog
Hi!
The Tegra20 requires an enabled VDE power domain during startup. As the
VDE is currently not used, it is disabled during runtime.
Since 8f0c714ad9be, there is a workaround for the "normal restart path"
which enables the VDE before doing PMC's warm reboot. This workaround is
not executed in the "emergency restart path", leading to a hang-up
during start.
This series implements and registers a new pmic-based restart handler
for boards with tps6586x. This cold reboot ensures that the VDE power
domain is enabled during startup on tegra20-based boards.
Since bae1d3a05a8b, i2c transfers are non-atomic while preemption is
disabled (which is e.g. done during panic()). This could lead to
warnings ("Voluntary context switch within RCU") in i2c-based restart
handlers during emergency restart. The state of preemption should be
detected by i2c_in_atomic_xfer_mode() to use atomic i2c xfer when
required. Beside the new system_state check, the check is the same as
the one pre v5.2.
v4: https://lore.kernel.org/r/20230327-tegra-pmic-reboot-v4-0-b24af219fb47@skid…
v3: https://lore.kernel.org/r/20230327-tegra-pmic-reboot-v3-0-3c0ee3567e14@skid…
v2: https://lore.kernel.org/all/20230320220345.1463687-1-bbara93@gmail.com/
system_state: https://lore.kernel.org/all/20230320213230.1459532-1-bbara93@gmail.com/
v1: https://lore.kernel.org/all/20230316164703.1157813-1-bbara93@gmail.com/
v5:
- introduce new 3 & 4, therefore 3 -> 5, 4 -> 6
- 3: provide dev to sys_off handler, if it is known
- 4: return NOTIFY_DONE from sys_off_notify, to avoid skipping
- 5: drop Reviewed-by from Dmitry, add poweroff timeout
- 5,6: return notifier value instead of direct errno from handler
- 5,6: use new dev field instead of passing dev as cb_data
- 5,6: increase timeout values based on error observations
- 6: skip unsupported reboot modes in restart handler
v4:
- 1,2: add "Fixes" and adapt commit messages
- 4: reduce delay after requesting the restart (as suggested by Dmitry)
v3:
- bring system_state back in this series
- do atomic i2c xfer if not preemptible (as suggested by Dmitry)
- fix style issues mentioned by Dmitry
- add cc stable as suggested by Dmitry
- add explanation why this is needed for Jon
v2:
- use devm-based restart handler
- convert the existing power_off handler to a devm-based handler
- handle system_state in extra series
---
Benjamin Bara (6):
kernel/reboot: emergency_restart: set correct system_state
i2c: core: run atomic i2c xfer when !preemptible
kernel/reboot: add device to sys_off_handler
kernel/reboot: sys_off_notify: always return NOTIFY_DONE
mfd: tps6586x: use devm-based power off handler
mfd: tps6586x: register restart handler
drivers/i2c/i2c-core.h | 2 +-
drivers/mfd/tps6586x.c | 55 ++++++++++++++++++++++++++++++++++++++++++--------
include/linux/reboot.h | 3 +++
kernel/reboot.c | 11 +++++++++-
4 files changed, 61 insertions(+), 10 deletions(-)
---
base-commit: 197b6b60ae7bc51dd0814953c562833143b292aa
change-id: 20230327-tegra-pmic-reboot-4175ff814a4b
Best regards,
--
Benjamin Bara <benjamin.bara(a)skidata.com>
This is for pre-6.4 kernels, as scrub code goes through a huge rework.
[BUG]
Even before the scrub rework, if we have some corrupted metadata failed
to be repaired during replace, we still continue replace and let it
finish just as there is nothing wrong:
BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started
BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1
BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5
BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5
BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752
BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1
BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished
This can lead to unexpected problems for the result fs.
[CAUSE]
Btrfs reuses scrub code path for dev-replace to iterate all dev extents.
But unlike scrub, dev-replace doesn't really bother to check the scrub
progress, which records all the errors found during replace.
And even if we checks the progress, we can not really determine which
errors are minor, which are critical just by the plain numbers.
(remember we don't treat metadata/data checksum error differently).
This behavior is there from the very beginning.
[FIX]
Instead of continue the replace, just error out if we hit an unrepaired
metadata sector.
Now the dev-replace would be rejected with -EIO, to inform the user.
Although it also means, the fs has some metadata error which can not be
repaired, the user would be super upset anyway.
The new dmesg would look like this:
BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started
BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
BTRFS error (device dm-4): unable to fixup (regular) error at logical 5570560 on dev /dev/mapper/test-scratch1 physical 5570560
BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5
BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5
BTRFS error (device dm-4): stripe 5570560 has unrepaired metadata sector at 5578752
BTRFS error (device dm-4): btrfs_scrub_dev(/dev/mapper/test-scratch1, 1, /dev/mapper/test-scratch2) failed -5
CC: stable(a)vger.kernel.org
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
I'm not sure how should we merge this patch.
The misc-next is already merging the new scrub code, but the problem is
there for all old kernels thus we need such fixes.
Maybe we can merge this fix before the scrub rework, then the rework,
and finally the better fix using reworked interface?
---
fs/btrfs/scrub.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ef4046a2572c..71f64b9bcd9f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -195,6 +195,7 @@ struct scrub_ctx {
struct mutex wr_lock;
struct btrfs_device *wr_tgtdev;
bool flush_all_writes;
+ bool has_meta_failed;
/*
* statistics
@@ -1380,6 +1381,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
btrfs_err_rl_in_rcu(fs_info,
"unable to fixup (regular) error at logical %llu on dev %s",
logical, btrfs_dev_name(dev));
+ if (is_metadata)
+ sctx->has_meta_failed = true;
}
out:
@@ -3838,6 +3841,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
blk_finish_plug(&plug);
+ /*
+ * If we have metadata unable to be repaired, we should error
+ * out the dev-replace.
+ */
+ if (sctx->is_dev_replace && sctx->has_meta_failed && ret >= 0)
+ ret = -EIO;
if (sctx->is_dev_replace && ret >= 0) {
int ret2;
--
2.39.2
Antgroup is using 5.10.y in product environment, we found several patches are
missing in 5.10.y tree. These patches are needed for us. So we backported them
to 5.10.y
Connor Kuehl (1):
virtiofs: split requests that exceed virtqueue size
Jiachen Zhang (1):
fuse: always revalidate rename target dentry
Miklos Szeredi (4):
virtiofs: clean up error handling in virtio_fs_get_tree()
fuse: check s_root when destroying sb
fuse: fix attr version comparison in fuse_read_update_size()
fuse: fix deadlock between atomic O_TRUNC and page invalidation
fs/fuse/dir.c | 7 ++++++-
fs/fuse/file.c | 31 +++++++++++++++++-------------
fs/fuse/fuse_i.h | 3 +++
fs/fuse/inode.c | 5 +++--
fs/fuse/virtio_fs.c | 46 +++++++++++++++++++++++++++++----------------
5 files changed, 60 insertions(+), 32 deletions(-)
--
2.40.0
Hi stable maintainers,
This is a backport patch for commit df205b5c6328 ("KVM: arm64: Filter
out invalid core register IDs in KVM_GET_REG_LIST") to 4.14 and 4.19.
This commit was not applied to the 4.14-stable tree due to merge
conflict [1]. To backport this, cherry-picked commit be25bbb392fa
("KVM: arm64: Factor out core register ID enumeration") that has no
functional changes but makes it easy to merge, and commit 5d8d4af24460
("arm64: KVM: Fix system register enumeration") that is a fix patch for
the commit.
I'd appreciate it if you could consider backporting this to 4.14 and
4.19.
Best regards,
Takahiro
[1] https://lore.kernel.org/all/1560343489-22906-1-git-send-email-Dave.Martin@a…
Dave Martin (2):
KVM: arm64: Factor out core register ID enumeration
KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST
arch/arm64/kvm/guest.c | 79 ++++++++++++++++++++++++++++++++++--------
1 file changed, 65 insertions(+), 14 deletions(-)
--
2.39.2
v2 -> v3:
* Cherry-pick an additional fix patch.
* Link to v2: https://lore.kernel.org/all/20230404034649.77915-1-itazur@amazon.com/
v1 -> v2:
* Fix a compile error for core_reg_size_from_offset().
* Link to v1: https://lore.kernel.org/all/20230403223028.45131-1-itazur@amazon.com/
From: David Stevens <stevensd(a)chromium.org>
Split folios during the second loop of shmem_undo_range. It's not
sufficient to only split folios when dealing with partial pages, since
it's possible for a THP to be faulted in after that point. Calling
truncate_inode_folio in that situation can result in throwing away data
outside of the range being targeted.
Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
Cc: stable(a)vger.kernel.org
Signed-off-by: David Stevens <stevensd(a)chromium.org>
---
v1 -> v2:
- Actually drop pages after splitting a THP
mm/shmem.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 9218c955f482..226c94a257b1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1033,7 +1033,22 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
}
VM_BUG_ON_FOLIO(folio_test_writeback(folio),
folio);
- truncate_inode_folio(mapping, folio);
+
+ if (!folio_test_large(folio)) {
+ truncate_inode_folio(mapping, folio);
+ } else if (truncate_inode_partial_folio(folio, lstart, lend)) {
+ /*
+ * If we split a page, reset the loop so that we
+ * pick up the new sub pages. Otherwise the THP
+ * was entirely dropped or the target range was
+ * zeroed, so just continue the loop as is.
+ */
+ if (!folio_test_large(folio)) {
+ folio_unlock(folio);
+ index = start;
+ break;
+ }
+ }
}
folio_unlock(folio);
}
--
2.40.0.634.g4ca3ef3211-goog
From: David Stevens <stevensd(a)chromium.org>
Split folios during the second loop of shmem_undo_range. It's not
sufficient to only split folios when dealing with partial pages, since
it's possible for a THP to be faulted in after that point. Calling
truncate_inode_folio in that situation can result in throwing away data
outside of the range being targeted.
Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
Cc: stable(a)vger.kernel.org
Signed-off-by: David Stevens <stevensd(a)chromium.org>
---
mm/shmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 9218c955f482..317cbeb0fb6b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1033,7 +1033,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
}
VM_BUG_ON_FOLIO(folio_test_writeback(folio),
folio);
- truncate_inode_folio(mapping, folio);
+ truncate_inode_partial_folio(folio, lstart, lend);
}
folio_unlock(folio);
}
--
2.40.0.634.g4ca3ef3211-goog
From: Damien Le Moal <dlemoal(a)kernel.org>
The address translation unit of the rockchip EP controller does not use
the lower 8 bits of a PCIe-space address to map local memory. Thus we
must set the align feature field to 256 to let the user know about this
constraint.
Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Signed-off-by: Rick Wertenbroek <rick.wertenbroek(a)gmail.com>
---
drivers/pci/controller/pcie-rockchip-ep.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
index edfced311a9f..0af0e965fb57 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -442,6 +442,7 @@ static const struct pci_epc_features rockchip_pcie_epc_features = {
.linkup_notifier = false,
.msi_capable = true,
.msix_capable = false,
+ .align = 256,
};
static const struct pci_epc_features*
--
2.25.1
The RK3399 PCIe endpoint controller cannot generate MSI-X IRQs.
This is documented in the RK3399 technical reference manual (TRM)
section 17.5.9 "Interrupt Support".
MSI-X capability should therefore not be advertised. Remove the
MSI-X capability by editing the capability linked-list. The
previous entry is the MSI capability, therefore get the next
entry from the MSI-X capability entry and set it as next entry
for the MSI capability. This in effect removes MSI-X from the list.
Linked list before : MSI cap -> MSI-X cap -> PCIe Device cap -> ...
Linked list now : MSI cap -> PCIe Device cap -> ...
Tested-by: Damien Le Moal <dlemoal(a)kernel.org>
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rick Wertenbroek <rick.wertenbroek(a)gmail.com>
---
drivers/pci/controller/pcie-rockchip-ep.c | 24 +++++++++++++++++++++++
drivers/pci/controller/pcie-rockchip.h | 5 +++++
2 files changed, 29 insertions(+)
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
index 63fbb379638b..edfced311a9f 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -507,6 +507,7 @@ static int rockchip_pcie_ep_probe(struct platform_device *pdev)
size_t max_regions;
struct pci_epc_mem_window *windows = NULL;
int err, i;
+ u32 cfg_msi, cfg_msix_cp;
ep = devm_kzalloc(dev, sizeof(*ep), GFP_KERNEL);
if (!ep)
@@ -582,6 +583,29 @@ static int rockchip_pcie_ep_probe(struct platform_device *pdev)
ep->irq_pci_addr = ROCKCHIP_PCIE_EP_DUMMY_IRQ_ADDR;
+ /*
+ * MSI-X is not supported but the controller still advertises the MSI-X
+ * capability by default, which can lead to the Root Complex side
+ * allocating MSI-X vectors which cannot be used. Avoid this by skipping
+ * the MSI-X capability entry in the PCIe capabilities linked-list: get
+ * the next pointer from the MSI-X entry and set that in the MSI
+ * capability entry (which is the previous entry). This way the MSI-X
+ * entry is skipped (left out of the linked-list) and not advertised.
+ */
+ cfg_msi = rockchip_pcie_read(rockchip, PCIE_EP_CONFIG_BASE +
+ ROCKCHIP_PCIE_EP_MSI_CTRL_REG);
+
+ cfg_msi &= ~ROCKCHIP_PCIE_EP_MSI_CP1_MASK;
+
+ cfg_msix_cp = rockchip_pcie_read(rockchip, PCIE_EP_CONFIG_BASE +
+ ROCKCHIP_PCIE_EP_MSIX_CAP_REG) &
+ ROCKCHIP_PCIE_EP_MSIX_CAP_CP_MASK;
+
+ cfg_msi |= cfg_msix_cp;
+
+ rockchip_pcie_write(rockchip, cfg_msi,
+ PCIE_EP_CONFIG_BASE + ROCKCHIP_PCIE_EP_MSI_CTRL_REG);
+
rockchip_pcie_write(rockchip, PCIE_CLIENT_CONF_ENABLE,
PCIE_CLIENT_CONFIG);
diff --git a/drivers/pci/controller/pcie-rockchip.h b/drivers/pci/controller/pcie-rockchip.h
index 501d859420b4..fe0333778fd9 100644
--- a/drivers/pci/controller/pcie-rockchip.h
+++ b/drivers/pci/controller/pcie-rockchip.h
@@ -227,6 +227,8 @@
#define ROCKCHIP_PCIE_EP_CMD_STATUS 0x4
#define ROCKCHIP_PCIE_EP_CMD_STATUS_IS BIT(19)
#define ROCKCHIP_PCIE_EP_MSI_CTRL_REG 0x90
+#define ROCKCHIP_PCIE_EP_MSI_CP1_OFFSET 8
+#define ROCKCHIP_PCIE_EP_MSI_CP1_MASK GENMASK(15, 8)
#define ROCKCHIP_PCIE_EP_MSI_FLAGS_OFFSET 16
#define ROCKCHIP_PCIE_EP_MSI_CTRL_MMC_OFFSET 17
#define ROCKCHIP_PCIE_EP_MSI_CTRL_MMC_MASK GENMASK(19, 17)
@@ -234,6 +236,9 @@
#define ROCKCHIP_PCIE_EP_MSI_CTRL_MME_MASK GENMASK(22, 20)
#define ROCKCHIP_PCIE_EP_MSI_CTRL_ME BIT(16)
#define ROCKCHIP_PCIE_EP_MSI_CTRL_MASK_MSI_CAP BIT(24)
+#define ROCKCHIP_PCIE_EP_MSIX_CAP_REG 0xb0
+#define ROCKCHIP_PCIE_EP_MSIX_CAP_CP_OFFSET 8
+#define ROCKCHIP_PCIE_EP_MSIX_CAP_CP_MASK GENMASK(15, 8)
#define ROCKCHIP_PCIE_EP_DUMMY_IRQ_ADDR 0x1
#define ROCKCHIP_PCIE_EP_PCI_LEGACY_IRQ_ADDR 0x3
#define ROCKCHIP_PCIE_EP_FUNC_BASE(fn) \
--
2.25.1
Assert PCI Configuration Enable bit after probe. When this bit is left to
0 in the endpoint mode, the RK3399 PCIe endpoint core will generate
configuration request retry status (CRS) messages back to the root complex.
Assert this bit after probe to allow the RK3399 PCIe endpoint core to reply
to configuration requests from the root complex.
This is documented in section 17.5.8.1.2 of the RK3399 TRM.
Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Cc: stable(a)vger.kernel.org
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Tested-by: Damien Le Moal <dlemoal(a)kernel.org>
Signed-off-by: Rick Wertenbroek <rick.wertenbroek(a)gmail.com>
---
drivers/pci/controller/pcie-rockchip-ep.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
index 9b835377bd9e..d00baed65eba 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -623,6 +623,9 @@ static int rockchip_pcie_ep_probe(struct platform_device *pdev)
ep->irq_pci_addr = ROCKCHIP_PCIE_EP_DUMMY_IRQ_ADDR;
+ rockchip_pcie_write(rockchip, PCIE_CLIENT_CONF_ENABLE,
+ PCIE_CLIENT_CONFIG);
+
return 0;
err_epc_mem_exit:
pci_epc_mem_exit(epc);
--
2.25.1
It was observed that there are hosts that may complete pending SETUP
transactions before the stop active transfers and controller halt occurs,
leading to lingering endxfer commands on DEPs on subsequent pullup/gadget
start iterations.
dwc3_gadget_ep_disable name=ep8in flags=0x3009 direction=1
dwc3_gadget_ep_disable name=ep4in flags=1 direction=1
dwc3_gadget_ep_disable name=ep3out flags=1 direction=0
usb_gadget_disconnect deactivated=0 connected=0 ret=0
The sequence shows that the USB gadget disconnect (dwc3_gadget_pullup(0))
routine completed successfully, allowing for the USB gadget to proceed with
a USB gadget connect. However, if this occurs the system runs into an
issue where:
BUG: spinlock already unlocked on CPU
spin_bug+0x0
dwc3_remove_requests+0x278
dwc3_ep0_out_start+0xb0
__dwc3_gadget_start+0x25c
This is due to the pending endxfers, leading to gadget start (w/o lock
held) to execute the remove requests, which will unlock the dwc3
spinlock as part of giveback.
To mitigate this, resolve the pending endxfers on the pullup disable
path by re-locating the SETUP phase check after stop active transfers, since
that is where the DWC3_EP_DELAY_STOP is potentially set. This also allows
for handling of a host that may be unresponsive by using the completion
timeout to trigger the stall and restart for EP0.
Fixes: c96683798e27 ("usb: dwc3: ep0: Don't prepare beyond Setup stage")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wesley Cheng <quic_wcheng(a)quicinc.com>
---
drivers/usb/dwc3/gadget.c | 49 +++++++++++++++++++++++++--------------
1 file changed, 32 insertions(+), 17 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 3c63fa97a680..be84c133f0d7 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2528,29 +2528,17 @@ static int __dwc3_gadget_start(struct dwc3 *dwc);
static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
{
unsigned long flags;
+ int ret;
spin_lock_irqsave(&dwc->lock, flags);
dwc->connected = false;
/*
- * Per databook, when we want to stop the gadget, if a control transfer
- * is still in process, complete it and get the core into setup phase.
+ * Attempt to end pending SETUP status phase, and not wait for the
+ * function to do so.
*/
- if (dwc->ep0state != EP0_SETUP_PHASE) {
- int ret;
-
- if (dwc->delayed_status)
- dwc3_ep0_send_delayed_status(dwc);
-
- reinit_completion(&dwc->ep0_in_setup);
-
- spin_unlock_irqrestore(&dwc->lock, flags);
- ret = wait_for_completion_timeout(&dwc->ep0_in_setup,
- msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
- spin_lock_irqsave(&dwc->lock, flags);
- if (ret == 0)
- dev_warn(dwc->dev, "timed out waiting for SETUP phase\n");
- }
+ if (dwc->delayed_status)
+ dwc3_ep0_send_delayed_status(dwc);
/*
* In the Synopsys DesignWare Cores USB3 Databook Rev. 3.30a
@@ -2563,6 +2551,33 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
__dwc3_gadget_stop(dwc);
spin_unlock_irqrestore(&dwc->lock, flags);
+ /*
+ * Per databook, when we want to stop the gadget, if a control transfer
+ * is still in process, complete it and get the core into setup phase.
+ * In case the host is unresponsive to a SETUP transaction, forcefully
+ * stall the transfer, and move back to the SETUP phase, so that any
+ * pending endxfers can be executed.
+ */
+ if (dwc->ep0state != EP0_SETUP_PHASE) {
+ reinit_completion(&dwc->ep0_in_setup);
+
+ ret = wait_for_completion_timeout(&dwc->ep0_in_setup,
+ msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
+ if (ret == 0) {
+ unsigned int dir;
+
+ dev_warn(dwc->dev, "wait for SETUP phase timed out\n");
+ spin_lock_irqsave(&dwc->lock, flags);
+ dir = !!dwc->ep0_expect_in;
+ if (dwc->ep0state == EP0_DATA_PHASE)
+ dwc3_ep0_end_control_data(dwc, dwc->eps[dir]);
+ else
+ dwc3_ep0_end_control_data(dwc, dwc->eps[!dir]);
+ dwc3_ep0_stall_and_restart(dwc);
+ spin_unlock_irqrestore(&dwc->lock, flags);
+ }
+ }
+
/*
* Note: if the GEVNTCOUNT indicates events in the event buffer, the
* driver needs to acknowledge them before the controller can halt.
The patch titled
Subject: ia64: fix an addr to taddr in huge_pte_offset()
has been added to the -mm mm-nonmm-unstable branch. Its filename is
ia64-fix-an-addr-to-taddr-in-huge_pte_offset.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: ia64: fix an addr to taddr in huge_pte_offset()
Date: Sun, 16 Apr 2023 22:17:05 -0700 (PDT)
I know nothing of ia64 htlbpage_to_page(), but guess that the p4d
line should be using taddr rather than addr, like everywhere else.
Link: https://lkml.kernel.org/r/732eae88-3beb-246-2c72-281de786740@google.com
Fixes: c03ab9e32a2c ("ia64: add support for folded p4d page tables")
Signed-off-by: Hugh Dickins <hughd(a)google.com
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/ia64/mm/hugetlbpage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/ia64/mm/hugetlbpage.c~ia64-fix-an-addr-to-taddr-in-huge_pte_offset
+++ a/arch/ia64/mm/hugetlbpage.c
@@ -58,7 +58,7 @@ huge_pte_offset (struct mm_struct *mm, u
pgd = pgd_offset(mm, taddr);
if (pgd_present(*pgd)) {
- p4d = p4d_offset(pgd, addr);
+ p4d = p4d_offset(pgd, taddr);
if (p4d_present(*p4d)) {
pud = pud_offset(p4d, taddr);
if (pud_present(*pud)) {
_
Patches currently in -mm which might be from hughd(a)google.com are
ia64-fix-an-addr-to-taddr-in-huge_pte_offset.patch
Mi dispiace disturbarti e invadere la tua privacy. Sono single,
solitario e bisognoso di un compagno premuroso, amorevole e romantico.
Sono un ammiratore segreto e vorrei esplorare l'opportunità di farlo
saperne di più l'uno sull'altro. So che è strano contattarti
in questo modo e spero che tu possa perdonarmi. Sono una persona timida
e
questo è l'unico modo in cui so di poter attirare la tua attenzione.
Voglio semplicemente
per sapere cosa ne pensate e la mia intenzione non è di offendervi.
Spero che possiamo essere amici se è quello che vuoi, anche se lo vorrei
essere più di un semplice amico. So che hai alcune domande da fare
chiedi e spero di poter soddisfare alcune delle tue curiosità con alcuni
risposte.
Credo nel detto che "per il mondo sei solo una persona,
ma per qualcuno di speciale tu sei il mondo'. Tutto quello che voglio è
amore,
cure e attenzioni romantiche da una compagna speciale quale sono io
sperando saresti tu.
Spero che questo messaggio sia l'inizio di un lungo periodo
comunicazione tra di noi, è sufficiente inviare una risposta a questo
messaggio, it
mi renderà felice.
Baci e abbracci,
Mario.
On my RTW8821CU chipset rfe_option reads as 0x22. Looking at the
vendor driver suggests that the field width of rfe_option is 5 bit,
so rfe_option should be masked with 0x1f.
Without this the rfe_option comparisons with 2 further down the
driver evaluate as false when they should really evaluate as true.
The effect is that 2G channels do not work.
rfe_option is also used as an array index into rtw8821c_rfe_defs[].
rtw8821c_rfe_defs[34] (0x22) was added as part of adding USB support,
likely because rfe_option reads as 0x22. As this now becomes 0x2,
rtw8821c_rfe_defs[34] is no longer used and can be removed.
Note that this might not be the whole truth. In the vendor driver
there are indeed places where the unmasked rfe_option value is used.
However, the driver has several places where rfe_option is tested
with the pattern if (rfe_option == 2 || rfe_option == 0x22) or
if (rfe_option == 4 || rfe_option == 0x24), so that rfe_option BIT(5)
has no influence on the code path taken. We therefore mask BIT(5)
out from rfe_option entirely until this assumption is proved wrong
by some chip variant we do not know yet.
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Tested-by: Alexandru gagniuc <mr.nuke.me(a)gmail.com>
Tested-by: Larry Finger <Larry.Finger(a)lwfinger.net>
Tested-by: ValdikSS <iam(a)valdikss.org.ru>
Cc: stable(a)vger.kernel.org
---
drivers/net/wireless/realtek/rtw88/rtw8821c.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/rtw8821c.c b/drivers/net/wireless/realtek/rtw88/rtw8821c.c
index 17f800f6efbd0..67efa58dd78ee 100644
--- a/drivers/net/wireless/realtek/rtw88/rtw8821c.c
+++ b/drivers/net/wireless/realtek/rtw88/rtw8821c.c
@@ -47,7 +47,7 @@ static int rtw8821c_read_efuse(struct rtw_dev *rtwdev, u8 *log_map)
map = (struct rtw8821c_efuse *)log_map;
- efuse->rfe_option = map->rfe_option;
+ efuse->rfe_option = map->rfe_option & 0x1f;
efuse->rf_board_option = map->rf_board_option;
efuse->crystal_cap = map->xtal_k;
efuse->pa_type_2g = map->pa_type;
@@ -1537,7 +1537,6 @@ static const struct rtw_rfe_def rtw8821c_rfe_defs[] = {
[2] = RTW_DEF_RFE_EXT(8821c, 0, 0, 2),
[4] = RTW_DEF_RFE_EXT(8821c, 0, 0, 2),
[6] = RTW_DEF_RFE(8821c, 0, 0),
- [34] = RTW_DEF_RFE(8821c, 0, 0),
};
static struct rtw_hw_reg rtw8821c_dig[] = {
--
2.39.2
Hello Dear,
I am sorry to bother you and intrude on your privacy. I am single,
lonely and in need of a caring, loving, and romantic companion.
I am a secret admirer and would like to explore the opportunity to
learn more about each other. I know it is strange to contact you
this way and I hope you can forgive me. I am a shy person and
this is the only way I know I could get your attention. I just want
to know what you think and my intention is not to offend you.
I hope we can be friends if that is what you want, although I wish
to be more than just a friend. I know you have a few questions to
ask and I hope I can satisfy some of your curiosity with a few
answers.
I believe in the saying that 'to the world, you are just one person,
but to someone special, you are the world'. All I want is love,
romantic care and attention from a special companion which I am
hoping would be you.
I hope this message will be the beginning of a long term
communication between us, simply send a reply to this message, it
will make me happy.
Hugs and kisses,
Secret admirer.
Hej min kära,
Jag är ledsen att jag stör dig och inkräktar på din integritet. Jag är
singel, ensam och i behov av en omtänksam, kärleksfull och romantisk
följeslagare.
Jag är en hemlig beundrare och skulle vilja utforska möjligheten att
lära mig mer om varandra. Jag vet att det är konstigt att kontakta dig
på det här sättet och jag hoppas att du kan förlåta mig. Jag är en blyg
person och det är det enda sättet jag vet att jag kan få din
uppmärksamhet. Jag vill bara veta vad du tycker och min avsikt är inte
att förolämpa dig.
Jag hoppas att vi kan vara vänner om det är vad du vill, även om jag
vill vara mer än bara en vän. Jag vet att du har några frågor att ställa
och jag hoppas att jag kan tillfredsställa en del av din nyfikenhet med
några svar.
Jag tror på talesättet att "för världen är du bara en person, men för
någon speciell är du världen". Allt jag vill ha är kärlek, romantisk
omsorg och uppmärksamhet från en speciell följeslagare som jag hoppas
skulle vara du.
Jag hoppas att detta meddelande kommer att bli början på en långsiktig
kommunikation mellan oss, skicka bara ett svar på detta meddelande, det
kommer att göra mig glad.
Puss och kram,
Marion.
The patch titled
Subject: nilfs2: initialize unused bytes in segment summary blocks
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-initialize-unused-bytes-in-segment-summary-blocks.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: initialize unused bytes in segment summary blocks
Date: Tue, 18 Apr 2023 02:35:13 +0900
Syzbot still reports uninit-value in nilfs_add_checksums_on_logs() for
KMSAN enabled kernels after applying commit 7397031622e0 ("nilfs2:
initialize "struct nilfs_binfo_dat"->bi_pad field").
This is because the unused bytes at the end of each block in segment
summaries are not initialized. So this fixes the issue by padding the
unused bytes with null bytes.
Link: https://lkml.kernel.org/r/20230417173513.12598-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+048585f3f4227bb2b49b(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b
Cc: Alexander Potapenko <glider(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
--- a/fs/nilfs2/segment.c~nilfs2-initialize-unused-bytes-in-segment-summary-blocks
+++ a/fs/nilfs2/segment.c
@@ -430,6 +430,23 @@ static int nilfs_segctor_reset_segment_b
return 0;
}
+/**
+ * nilfs_segctor_zeropad_segsum - zero pad the rest of the segment summary area
+ * @sci: segment constructor object
+ *
+ * nilfs_segctor_zeropad_segsum() zero-fills unallocated space at the end of
+ * the current segment summary block.
+ */
+static void nilfs_segctor_zeropad_segsum(struct nilfs_sc_info *sci)
+{
+ struct nilfs_segsum_pointer *ssp;
+
+ ssp = sci->sc_blk_cnt > 0 ? &sci->sc_binfo_ptr : &sci->sc_finfo_ptr;
+ if (ssp->offset < ssp->bh->b_size)
+ memset(ssp->bh->b_data + ssp->offset, 0,
+ ssp->bh->b_size - ssp->offset);
+}
+
static int nilfs_segctor_feed_segment(struct nilfs_sc_info *sci)
{
sci->sc_nblk_this_inc += sci->sc_curseg->sb_sum.nblocks;
@@ -438,6 +455,7 @@ static int nilfs_segctor_feed_segment(st
* The current segment is filled up
* (internal code)
*/
+ nilfs_segctor_zeropad_segsum(sci);
sci->sc_curseg = NILFS_NEXT_SEGBUF(sci->sc_curseg);
return nilfs_segctor_reset_segment_buffer(sci);
}
@@ -542,6 +560,7 @@ static int nilfs_segctor_add_file_block(
goto retry;
}
if (unlikely(required)) {
+ nilfs_segctor_zeropad_segsum(sci);
err = nilfs_segbuf_extend_segsum(segbuf);
if (unlikely(err))
goto failed;
@@ -1533,6 +1552,7 @@ static int nilfs_segctor_collect(struct
nadd = min_t(int, nadd << 1, SC_MAX_SEGDELTA);
sci->sc_stage = prev_stage;
}
+ nilfs_segctor_zeropad_segsum(sci);
nilfs_segctor_truncate_segments(sci, sci->sc_curseg, nilfs->ns_sufile);
return 0;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-initialize-unused-bytes-in-segment-summary-blocks.patch
The patch titled
Subject: mm/hugetlb: fix uffd-wp bit lost when unsharing happens
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-fix-uffd-wp-bit-lost-when-unsharing-happens.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix uffd-wp bit lost when unsharing happens
Date: Mon, 17 Apr 2023 15:53:13 -0400
When we try to unshare a pinned page for a private hugetlb, uffd-wp bit
can get lost during unsharing. Fix it by carrying it over.
This should be very rare, only if an unsharing happened on a private
hugetlb page with uffd-wp protected (e.g. in a child which shares the
same page with parent with UFFD_FEATURE_EVENT_FORK enabled).
Link: https://lkml.kernel.org/r/20230417195317.898696-3-peterx@redhat.com
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mika Penttil�� <mpenttil(a)redhat.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-uffd-wp-bit-lost-when-unsharing-happens
+++ a/mm/hugetlb.c
@@ -5644,13 +5644,16 @@ retry_avoidcopy:
spin_lock(ptl);
ptep = hugetlb_walk(vma, haddr, huge_page_size(h));
if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare);
+
/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
mmu_notifier_invalidate_range(mm, range.start, range.end);
page_remove_rmap(old_page, vma, true);
hugepage_add_new_anon_rmap(new_folio, vma, haddr);
- set_huge_pte_at(mm, haddr, ptep,
- make_huge_pte(vma, &new_folio->page, !unshare));
+ if (huge_pte_uffd_wp(pte))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(mm, haddr, ptep, newpte);
folio_set_hugetlb_migratable(new_folio);
/* Make the old page be freed below */
new_folio = page_folio(old_page);
_
Patches currently in -mm which might be from peterx(a)redhat.com are
selftests-mm-update-gitignore-with-two-missing-tests.patch
selftests-mm-dump-a-summary-in-run_vmtestssh.patch
selftests-mm-merge-utilh-into-vm_utilh.patch
selftests-mm-use-test_gen_progs-where-proper.patch
selftests-mm-link-vm_utilc-always.patch
selftests-mm-merge-default_huge_page_size-into-one.patch
selftests-mm-use-pm_-macros-in-vm_utilsh.patch
selftests-mm-reuse-pagemap_get_entry-in-vm_utilh.patch
selftests-mm-test-uffdio_zeropage-only-when-hugetlb.patch
selftests-mm-drop-test_uffdio_zeropage_eexist.patch
selftests-mm-create-uffd-common.patch
selftests-mm-split-uffd-tests-into-uffd-stress-and-uffd-unit-tests.patch
selftests-mm-uffd_register.patch
selftests-mm-uffd_open_devsys.patch
selftests-mm-uffdio_api-test.patch
selftests-mm-drop-global-mem_fd-in-uffd-tests.patch
selftests-mm-drop-global-hpage_size-in-uffd-tests.patch
selftests-mm-rename-uffd_stats-to-uffd_args.patch
selftests-mm-let-uffd_handle_page_fault-take-wp-parameter.patch
selftests-mm-allow-allocate_area-to-fail-properly.patch
selftests-mm-add-framework-for-uffd-unit-test.patch
selftests-mm-move-uffd-pagemap-test-to-unit-test.patch
selftests-mm-move-uffd-minor-test-to-unit-test.patch
selftests-mm-move-uffd-sig-events-tests-into-uffd-unit-tests.patch
selftests-mm-move-zeropage-test-into-uffd-unit-tests.patch
selftests-mm-workaround-no-way-to-detect-uffd-minor-wp.patch
selftests-mm-allow-uffd-test-to-skip-properly-with-no-privilege.patch
selftests-mm-drop-sys-dev-test-in-uffd-stress-test.patch
selftests-mm-add-shmem-private-test-to-uffd-stress.patch
selftests-mm-add-uffdio-register-ioctls-test.patch
mm-hugetlb-fix-uffd-wp-during-fork.patch
mm-hugetlb-fix-uffd-wp-bit-lost-when-unsharing-happens.patch
selftests-mm-add-a-few-options-for-uffd-unit-test.patch
selftests-mm-extend-and-rename-uffd-pagemap-test.patch
selftests-mm-rename-cow_extra_libs-to-iouring_extra_libs.patch
selftests-mm-add-tests-for-ro-pinning-vs-fork.patch
The patch titled
Subject: mm/hugetlb: fix uffd-wp during fork()
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-fix-uffd-wp-during-fork.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix uffd-wp during fork()
Date: Mon, 17 Apr 2023 15:53:12 -0400
Patch series "mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins",
v2.
This patch (of 6):
There're a bunch of things that were wrong:
- Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
rather than huge_pte_uffd_wp().
- When copying over a pte, we should drop uffd-wp bit when
!EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
- When doing early CoW for private hugetlb (e.g. when the parent page was
pinned), uffd-wp bit should be properly carried over if necessary.
No bug reported probably because most people do not even care about these
corner cases, but they are still bugs and can be exposed by the recent unit
tests introduced, so fix all of them in one shot.
Link: https://lkml.kernel.org/r/20230417195317.898696-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230417195317.898696-2-peterx@redhat.com
Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mika Penttil�� <mpenttil(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-uffd-wp-during-fork
+++ a/mm/hugetlb.c
@@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(
static void
hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
- struct folio *new_folio)
+ struct folio *new_folio, pte_t old)
{
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
+
__folio_mark_uptodate(new_folio);
hugepage_add_new_anon_rmap(new_folio, vma, addr);
- set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
+ if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
folio_set_hugetlb_migratable(new_folio);
}
@@ -5032,14 +5036,12 @@ again:
*/
;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
- bool uffd_wp = huge_pte_uffd_wp(entry);
-
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_hugetlb_entry_migration(entry))) {
swp_entry_t swp_entry = pte_to_swp_entry(entry);
- bool uffd_wp = huge_pte_uffd_wp(entry);
+ bool uffd_wp = pte_swp_uffd_wp(entry);
if (!is_readable_migration_entry(swp_entry) && cow) {
/*
@@ -5050,10 +5052,10 @@ again:
swp_offset(swp_entry));
entry = swp_entry_to_pte(swp_entry);
if (userfaultfd_wp(src_vma) && uffd_wp)
- entry = huge_pte_mkuffd_wp(entry);
+ entry = pte_swp_mkuffd_wp(entry);
set_huge_pte_at(src, addr, src_pte, entry);
}
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_pte_marker(entry))) {
@@ -5118,7 +5120,8 @@ again:
/* huge_ptep of dst_pte won't change as in child */
goto again;
}
- hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
+ hugetlb_install_folio(dst_vma, dst_pte, addr,
+ new_folio, src_pte_old);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
continue;
@@ -5136,6 +5139,9 @@ again:
entry = huge_pte_wrprotect(entry);
}
+ if (!userfaultfd_wp(dst_vma))
+ entry = huge_pte_clear_uffd_wp(entry);
+
set_huge_pte_at(dst, addr, dst_pte, entry);
hugetlb_count_add(npages, dst);
}
_
Patches currently in -mm which might be from peterx(a)redhat.com are
selftests-mm-update-gitignore-with-two-missing-tests.patch
selftests-mm-dump-a-summary-in-run_vmtestssh.patch
selftests-mm-merge-utilh-into-vm_utilh.patch
selftests-mm-use-test_gen_progs-where-proper.patch
selftests-mm-link-vm_utilc-always.patch
selftests-mm-merge-default_huge_page_size-into-one.patch
selftests-mm-use-pm_-macros-in-vm_utilsh.patch
selftests-mm-reuse-pagemap_get_entry-in-vm_utilh.patch
selftests-mm-test-uffdio_zeropage-only-when-hugetlb.patch
selftests-mm-drop-test_uffdio_zeropage_eexist.patch
selftests-mm-create-uffd-common.patch
selftests-mm-split-uffd-tests-into-uffd-stress-and-uffd-unit-tests.patch
selftests-mm-uffd_register.patch
selftests-mm-uffd_open_devsys.patch
selftests-mm-uffdio_api-test.patch
selftests-mm-drop-global-mem_fd-in-uffd-tests.patch
selftests-mm-drop-global-hpage_size-in-uffd-tests.patch
selftests-mm-rename-uffd_stats-to-uffd_args.patch
selftests-mm-let-uffd_handle_page_fault-take-wp-parameter.patch
selftests-mm-allow-allocate_area-to-fail-properly.patch
selftests-mm-add-framework-for-uffd-unit-test.patch
selftests-mm-move-uffd-pagemap-test-to-unit-test.patch
selftests-mm-move-uffd-minor-test-to-unit-test.patch
selftests-mm-move-uffd-sig-events-tests-into-uffd-unit-tests.patch
selftests-mm-move-zeropage-test-into-uffd-unit-tests.patch
selftests-mm-workaround-no-way-to-detect-uffd-minor-wp.patch
selftests-mm-allow-uffd-test-to-skip-properly-with-no-privilege.patch
selftests-mm-drop-sys-dev-test-in-uffd-stress-test.patch
selftests-mm-add-shmem-private-test-to-uffd-stress.patch
selftests-mm-add-uffdio-register-ioctls-test.patch
mm-hugetlb-fix-uffd-wp-during-fork.patch
mm-hugetlb-fix-uffd-wp-bit-lost-when-unsharing-happens.patch
selftests-mm-add-a-few-options-for-uffd-unit-test.patch
selftests-mm-extend-and-rename-uffd-pagemap-test.patch
selftests-mm-rename-cow_extra_libs-to-iouring_extra_libs.patch
selftests-mm-add-tests-for-ro-pinning-vs-fork.patch
There're a bunch of things that were wrong:
- Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
rather than huge_pte_uffd_wp().
- When copying over a pte, we should drop uffd-wp bit when
!EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
- When doing early CoW for private hugetlb (e.g. when the parent page was
pinned), uffd-wp bit should be properly carried over if necessary.
No bug reported probably because most people do not even care about these
corner cases, but they are still bugs and can be exposed by the recent unit
tests introduced, so fix all of them in one shot.
Cc: linux-stable <stable(a)vger.kernel.org>
Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/hugetlb.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f16b25b1a6b9..0213efaf31be 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
static void
hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
- struct folio *new_folio)
+ struct folio *new_folio, pte_t old)
{
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
+
__folio_mark_uptodate(new_folio);
hugepage_add_new_anon_rmap(new_folio, vma, addr);
- set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
+ if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
folio_set_hugetlb_migratable(new_folio);
}
@@ -5032,14 +5036,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*/
;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
- bool uffd_wp = huge_pte_uffd_wp(entry);
-
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_hugetlb_entry_migration(entry))) {
swp_entry_t swp_entry = pte_to_swp_entry(entry);
- bool uffd_wp = huge_pte_uffd_wp(entry);
+ bool uffd_wp = pte_swp_uffd_wp(entry);
if (!is_readable_migration_entry(swp_entry) && cow) {
/*
@@ -5050,10 +5052,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
swp_offset(swp_entry));
entry = swp_entry_to_pte(swp_entry);
if (userfaultfd_wp(src_vma) && uffd_wp)
- entry = huge_pte_mkuffd_wp(entry);
+ entry = pte_swp_mkuffd_wp(entry);
set_huge_pte_at(src, addr, src_pte, entry);
}
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_pte_marker(entry))) {
@@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
/* huge_ptep of dst_pte won't change as in child */
goto again;
}
- hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
+ hugetlb_install_folio(dst_vma, dst_pte, addr,
+ new_folio, src_pte_old);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
continue;
@@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
entry = huge_pte_wrprotect(entry);
}
+ if (!userfaultfd_wp(dst_vma))
+ entry = huge_pte_clear_uffd_wp(entry);
+
set_huge_pte_at(dst, addr, dst_pte, entry);
hugetlb_count_add(npages, dst);
}
--
2.39.1
Several calls to ci_dpm_fini() will attempt to free resources that
either have been freed before or haven't been allocated yet. This
may lead to undefined or dangerous behaviour.
For instance, if r600_parse_extended_power_table() fails, it might
call r600_free_extended_power_table() as will ci_dpm_fini() later
during error handling.
Fix this by only freeing pointers to objects previously allocated.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: cc8dbbb4f62a ("drm/radeon: add dpm support for CI dGPUs (v2)")
Cc: stable(a)vger.kernel.org
Co-developed-by: Natalia Petrova <n.petrova(a)fintech.ru>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
---
v2: free only resouces allocated prior, do not remove ci_dpm_fini()
or other deallocating calls altogether; fix commit message.
v1: https://lore.kernel.org/all/20230403182808.8699-1-n.zhandarovich@fintech.ru/
drivers/gpu/drm/radeon/ci_dpm.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c
index 8ef25ab305ae..b8f4dac68d85 100644
--- a/drivers/gpu/drm/radeon/ci_dpm.c
+++ b/drivers/gpu/drm/radeon/ci_dpm.c
@@ -5517,6 +5517,7 @@ static int ci_parse_power_table(struct radeon_device *rdev)
u8 frev, crev;
u8 *power_state_offset;
struct ci_ps *ps;
+ int ret;
if (!atom_parse_data_header(mode_info->atom_context, index, NULL,
&frev, &crev, &data_offset))
@@ -5546,11 +5547,15 @@ static int ci_parse_power_table(struct radeon_device *rdev)
non_clock_array_index = power_state->v2.nonClockInfoIndex;
non_clock_info = (struct _ATOM_PPLIB_NONCLOCK_INFO *)
&non_clock_info_array->nonClockInfo[non_clock_array_index];
- if (!rdev->pm.power_state[i].clock_info)
- return -EINVAL;
+ if (!rdev->pm.power_state[i].clock_info) {
+ ret = -EINVAL;
+ goto err_free_ps;
+ }
ps = kzalloc(sizeof(struct ci_ps), GFP_KERNEL);
- if (ps == NULL)
- return -ENOMEM;
+ if (ps == NULL) {
+ ret = -ENOMEM;
+ goto err_free_ps;
+ }
rdev->pm.dpm.ps[i].ps_priv = ps;
ci_parse_pplib_non_clock_info(rdev, &rdev->pm.dpm.ps[i],
non_clock_info,
@@ -5590,6 +5595,12 @@ static int ci_parse_power_table(struct radeon_device *rdev)
}
return 0;
+
+err_free_ps:
+ for (i = 0; i < rdev->pm.dpm.num_ps; i++)
+ kfree(rdev->pm.dpm.ps[i].ps_priv);
+ kfree(rdev->pm.dpm.ps);
+ return ret;
}
static int ci_get_vbios_boot_values(struct radeon_device *rdev,
@@ -5678,25 +5689,26 @@ int ci_dpm_init(struct radeon_device *rdev)
ret = ci_get_vbios_boot_values(rdev, &pi->vbios_boot_state);
if (ret) {
- ci_dpm_fini(rdev);
+ kfree(rdev->pm.dpm.priv);
return ret;
}
ret = r600_get_platform_caps(rdev);
if (ret) {
- ci_dpm_fini(rdev);
+ kfree(rdev->pm.dpm.priv);
return ret;
}
ret = r600_parse_extended_power_table(rdev);
if (ret) {
- ci_dpm_fini(rdev);
+ kfree(rdev->pm.dpm.priv);
return ret;
}
ret = ci_parse_power_table(rdev);
if (ret) {
- ci_dpm_fini(rdev);
+ kfree(rdev->pm.dpm.priv);
+ r600_free_extended_power_table(rdev);
return ret;
}
Smatch detected a double free path because lpfc_nlp_not_used releases an
ndlp object before reaching lpfc_nlp_put at the end of
lpfc_cmpl_els_logo_acc.
Remove the outdated lpfc_nlp_not_used routine. In
lpfc_mbx_cmpl_ns_reg_login, replace the call with lpfc_nlp_put. In
lpfc_cmpl_els_logo_acc, replace the call with lpfc_unreg_rpi and keep the
lpfc_nlp_put at the end of the routine. If ndlp's rpi was registered, then
lpfc_unreg_rpi's completion routine performs the final ndlp clean up after
lpfc_nlp_put is called from lpfc_cmpl_els_logo_acc. Otherwise if ndlp has
no rpi registered, the lpfc_nlp_put at the end of lpfc_cmpl_els_logo_acc is
the final ndlp clean up.
Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking")
Cc: <stable(a)vger.kernel.org> # v5.11+
Reported-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/all/Y3OefhyyJNKH%2Fiaf@kili/
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
---
drivers/scsi/lpfc/lpfc_crtn.h | 1 -
drivers/scsi/lpfc/lpfc_els.c | 30 +++++++-----------------------
drivers/scsi/lpfc/lpfc_hbadisc.c | 24 +++---------------------
3 files changed, 10 insertions(+), 45 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index b833b983e69d..0b9edde26abd 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -134,7 +134,6 @@ void lpfc_check_nlp_post_devloss(struct lpfc_vport *vport,
struct lpfc_nodelist *ndlp);
void lpfc_ignore_els_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
struct lpfc_iocbq *rspiocb);
-int lpfc_nlp_not_used(struct lpfc_nodelist *ndlp);
struct lpfc_nodelist *lpfc_setup_disc_node(struct lpfc_vport *, uint32_t);
void lpfc_disc_list_loopmap(struct lpfc_vport *);
void lpfc_disc_start(struct lpfc_vport *);
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 6a15f879e517..a3c8550e9985 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -5205,14 +5205,9 @@ lpfc_els_free_iocb(struct lpfc_hba *phba, struct lpfc_iocbq *elsiocb)
*
* This routine is the completion callback function to the Logout (LOGO)
* Accept (ACC) Response ELS command. This routine is invoked to indicate
- * the completion of the LOGO process. It invokes the lpfc_nlp_not_used() to
- * release the ndlp if it has the last reference remaining (reference count
- * is 1). If succeeded (meaning ndlp released), it sets the iocb ndlp
- * field to NULL to inform the following lpfc_els_free_iocb() routine no
- * ndlp reference count needs to be decremented. Otherwise, the ndlp
- * reference use-count shall be decremented by the lpfc_els_free_iocb()
- * routine. Finally, the lpfc_els_free_iocb() is invoked to release the
- * IOCB data structure.
+ * the completion of the LOGO process. If the node has transitioned to NPR,
+ * this routine unregisters the RPI if it is still registered. The
+ * lpfc_els_free_iocb() is invoked to release the IOCB data structure.
**/
static void
lpfc_cmpl_els_logo_acc(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
@@ -5253,19 +5248,9 @@ lpfc_cmpl_els_logo_acc(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
(ndlp->nlp_last_elscmd == ELS_CMD_PLOGI))
goto out;
- /* NPort Recovery mode or node is just allocated */
- if (!lpfc_nlp_not_used(ndlp)) {
- /* A LOGO is completing and the node is in NPR state.
- * Just unregister the RPI because the node is still
- * required.
- */
+ if (ndlp->nlp_flag & NLP_RPI_REGISTERED)
lpfc_unreg_rpi(vport, ndlp);
- } else {
- /* Indicate the node has already released, should
- * not reference to it from within lpfc_els_free_iocb.
- */
- cmdiocb->ndlp = NULL;
- }
+
}
out:
/*
@@ -5285,9 +5270,8 @@ lpfc_cmpl_els_logo_acc(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
* RPI (Remote Port Index) mailbox command to the @phba. It simply releases
* the associated lpfc Direct Memory Access (DMA) buffer back to the pool and
* decrements the ndlp reference count held for this completion callback
- * function. After that, it invokes the lpfc_nlp_not_used() to check
- * whether there is only one reference left on the ndlp. If so, it will
- * perform one more decrement and trigger the release of the ndlp.
+ * function. After that, it invokes the lpfc_drop_node to check
+ * whether it is appropriate to release the node.
**/
void
lpfc_mbx_cmpl_dflt_rpi(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmb)
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 5ba3a9ad9501..67bfdddb897c 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -4333,13 +4333,14 @@ lpfc_mbx_cmpl_ns_reg_login(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmb)
/* If the node is not registered with the scsi or nvme
* transport, remove the fabric node. The failed reg_login
- * is terminal.
+ * is terminal and forces the removal of the last node
+ * reference.
*/
if (!(ndlp->fc4_xpt_flags & (SCSI_XPT_REGD | NVME_XPT_REGD))) {
spin_lock_irq(&ndlp->lock);
ndlp->nlp_flag &= ~NLP_NPR_2B_DISC;
spin_unlock_irq(&ndlp->lock);
- lpfc_nlp_not_used(ndlp);
+ lpfc_nlp_put(ndlp);
}
if (phba->fc_topology == LPFC_TOPOLOGY_LOOP) {
@@ -6704,25 +6705,6 @@ lpfc_nlp_put(struct lpfc_nodelist *ndlp)
return ndlp ? kref_put(&ndlp->kref, lpfc_nlp_release) : 0;
}
-/* This routine free's the specified nodelist if it is not in use
- * by any other discovery thread. This routine returns 1 if the
- * ndlp has been freed. A return value of 0 indicates the ndlp is
- * not yet been released.
- */
-int
-lpfc_nlp_not_used(struct lpfc_nodelist *ndlp)
-{
- lpfc_debugfs_disc_trc(ndlp->vport, LPFC_DISC_TRC_NODE,
- "node not used: did:x%x flg:x%x refcnt:x%x",
- ndlp->nlp_DID, ndlp->nlp_flag,
- kref_read(&ndlp->kref));
-
- if (kref_read(&ndlp->kref) == 1)
- if (lpfc_nlp_put(ndlp))
- return 1;
- return 0;
-}
-
/**
* lpfc_fcf_inuse - Check if FCF can be unregistered.
* @phba: Pointer to hba context object.
--
2.38.0
Previously, capability was checked using capable(), which verified that the
caller of the ioctl system call had the required capability. In addition,
the result of the check would be stored in the HCI_SOCK_TRUSTED flag,
making it persistent for the socket.
However, malicious programs can abuse this approach by deliberately sharing
an HCI socket with a privileged task. The HCI socket will be marked as
trusted when the privileged task occasionally makes an ioctl call.
This problem can be solved by using sk_capable() to check capability, which
ensures that not only the current task but also the socket opener has the
specified capability, thus reducing the risk of privilege escalation
through the previously identified vulnerability.
Cc: stable(a)vger.kernel.org
Fixes: f81f5b2db869 ("Bluetooth: Send control open and close messages for HCI raw sockets")
Signed-off-by: Ruihan Li <lrh2000(a)pku.edu.cn>
---
net/bluetooth/hci_sock.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 065812232..f597fe0db 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -1003,7 +1003,14 @@ static int hci_sock_ioctl(struct socket *sock, unsigned int cmd,
if (hci_sock_gen_cookie(sk)) {
struct sk_buff *skb;
- if (capable(CAP_NET_ADMIN))
+ /* Perform careful checks before setting the HCI_SOCK_TRUSTED
+ * flag. Make sure that not only the current task but also
+ * the socket opener has the required capability, since
+ * privileged programs can be tricked into making ioctl calls
+ * on HCI sockets, and the socket should not be marked as
+ * trusted simply because the ioctl caller is privileged.
+ */
+ if (sk_capable(sk, CAP_NET_ADMIN))
hci_sock_set_flag(sk, HCI_SOCK_TRUSTED);
/* Send event to monitor */
--
2.40.0
commit ba9182a89626d5f83c2ee4594f55cb9c1e60f0e2 upstream.
After a successful cpuset_can_attach() call which increments the
attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach()
will be called later. In cpuset_attach(), tasks in cpuset_attach_wq,
if present, will be woken up at the end. That is not the case in
cpuset_cancel_attach(). So missed wakeup is possible if the attach
operation is somehow cancelled. Fix that by doing the wakeup in
cpuset_cancel_attach() as well.
Fixes: e44193d39e8d ("cpuset: let hotplug propagation work wait for task attaching")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Reviewed-by: Michal Koutný <mkoutny(a)suse.com>
Cc: stable(a)vger.kernel.org # v3.11+
Signed-off-by: Tejun Heo <tj(a)kernel.org>
---
kernel/cgroup/cpuset.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 0b82e3857d33..20c1ccc6a0bb 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1508,7 +1508,9 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
cs = css_cs(css);
mutex_lock(&cpuset_mutex);
- css_cs(css)->attach_in_progress--;
+ cs->attach_in_progress--;
+ if (!cs->attach_in_progress)
+ wake_up(&cpuset_attach_wq);
mutex_unlock(&cpuset_mutex);
}
--
2.31.1
commit ba9182a89626d5f83c2ee4594f55cb9c1e60f0e2 upstream.
After a successful cpuset_can_attach() call which increments the
attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach()
will be called later. In cpuset_attach(), tasks in cpuset_attach_wq,
if present, will be woken up at the end. That is not the case in
cpuset_cancel_attach(). So missed wakeup is possible if the attach
operation is somehow cancelled. Fix that by doing the wakeup in
cpuset_cancel_attach() as well.
Fixes: e44193d39e8d ("cpuset: let hotplug propagation work wait for task attaching")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Reviewed-by: Michal Koutný <mkoutny(a)suse.com>
Cc: stable(a)vger.kernel.org # v3.11+
Signed-off-by: Tejun Heo <tj(a)kernel.org>
---
kernel/cgroup/cpuset.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c6d412cebc43..3067d3e5a51d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1504,7 +1504,9 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
cs = css_cs(css);
mutex_lock(&cpuset_mutex);
- css_cs(css)->attach_in_progress--;
+ cs->attach_in_progress--;
+ if (!cs->attach_in_progress)
+ wake_up(&cpuset_attach_wq);
mutex_unlock(&cpuset_mutex);
}
--
2.31.1
Hi,
please pick
1e020e1b96afd ("ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size")
from linux-next to stable trees.
The commit fixes a problem with another recent commit
1b42b1a36fc94 ("ubi: ensure that VID header offset ... size")
which has already made it to stable trees and thereby broke attaching
UBI and hence renders devices unbootable.
As a temporary fix I have applied this patch downstream in OpenWrt.
Thank you!
Best regards
Daniel
----- Forwarded message from Daniel Golle <daniel(a)makrotopia.org> -----
Date: Sat, 15 Apr 2023 00:12:46 +0100
From: Daniel Golle <daniel(a)makrotopia.org>
To: Richard Weinberger <richard(a)nod.at>
Cc: Miquel Raynal <miquel.raynal(a)bootlin.com>, chengzhihao1 <chengzhihao1(a)huawei.com>, Nicolas Schichan <nschichan(a)freebox.fr>, George Kennedy <george.kennedy(a)oracle.com>, linux-kernel <linux-kernel(a)vger.kernel.org>, linux-mtd
<linux-mtd(a)lists.infradead.org>, Sascha Hauer <s.hauer(a)pengutronix.de>, yi zhang <yi.zhang(a)huawei.com>
Subject: Re: [PATCH -next v3] ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
Hi Richard,
On Wed, Mar 29, 2023 at 11:33:40PM +0200, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > Von: "Miquel Raynal" <miquel.raynal(a)bootlin.com>
> >> Thanks for testing.
> >>
> >> > Tested-by: Nicolas Schichan <nschichan(a)freebox.fr>
> >
> > Same here.
> >
> > Tested-by: Miquel Raynal <miquel.raynal(a)bootlin.com> # v5.10, v4.19
>
> Applied to next, PR will follow soon.
As stable linux trees are affected I wonder when this will hit
linux-stable, ie. will it be part of 5.15.108, for example?
Cheers
Daniel
----- End forwarded message -----
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x eee87853794187f6adbe19533ed79c8b44b36a91
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041705-gauze-elated-d7c7@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
eee878537941 ("cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods")
42a11bf5c543 ("cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly")
18f9a4d47527 ("cgroup/cpuset: Skip spread flags update on v2")
e2d59900d936 ("cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective")
18065ebe9b33 ("cgroup/cpuset: Miscellaneous cleanups & add helper functions")
f9da322e864e ("cgroup: cleanup comments")
8ca1b5a49885 ("mm/page_alloc: detect allocation forbidden by cpuset and bail out early")
b94f9ac79a73 ("cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem")
69dc8010b8fc ("Merge branch 'for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From eee87853794187f6adbe19533ed79c8b44b36a91 Mon Sep 17 00:00:00 2001
From: Waiman Long <longman(a)redhat.com>
Date: Tue, 11 Apr 2023 09:35:59 -0400
Subject: [PATCH] cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork()
methods
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept
new tasks. It is too late to check that in cpuset_fork(). So we need
to add the cpuset_can_fork() and cpuset_cancel_fork() methods to
pre-check it before we can allow attachment to a different cpuset.
We also need to set the attach_in_progress flag to alert other code
that a new task is going to be added to the cpuset.
Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups")
Suggested-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: stable(a)vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2ccfae74acf9..166a45019f66 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2453,6 +2453,20 @@ static int fmeter_getrate(struct fmeter *fmp)
static struct cpuset *cpuset_attach_old_cs;
+/*
+ * Check to see if a cpuset can accept a new task
+ * For v1, cpus_allowed and mems_allowed can't be empty.
+ * For v2, effective_cpus can't be empty.
+ * Note that in v1, effective_cpus = cpus_allowed.
+ */
+static int cpuset_can_attach_check(struct cpuset *cs)
+{
+ if (cpumask_empty(cs->effective_cpus) ||
+ (!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
+ return -ENOSPC;
+ return 0;
+}
+
/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
@@ -2467,16 +2481,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
percpu_down_write(&cpuset_rwsem);
- /* allow moving tasks into an empty cpuset if on default hierarchy */
- ret = -ENOSPC;
- if (!is_in_v2_mode() &&
- (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
- goto out_unlock;
-
- /*
- * Task cannot be moved to a cpuset with empty effective cpus.
- */
- if (cpumask_empty(cs->effective_cpus))
+ /* Check to see if task is allowed in the cpuset */
+ ret = cpuset_can_attach_check(cs);
+ if (ret)
goto out_unlock;
cgroup_taskset_for_each(task, css, tset) {
@@ -2493,7 +2500,6 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
* changes which zero cpus/mems_allowed.
*/
cs->attach_in_progress++;
- ret = 0;
out_unlock:
percpu_up_write(&cpuset_rwsem);
return ret;
@@ -3264,6 +3270,68 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
percpu_up_write(&cpuset_rwsem);
}
+/*
+ * In case the child is cloned into a cpuset different from its parent,
+ * additional checks are done to see if the move is allowed.
+ */
+static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
+{
+ struct cpuset *cs = css_cs(cset->subsys[cpuset_cgrp_id]);
+ bool same_cs;
+ int ret;
+
+ rcu_read_lock();
+ same_cs = (cs == task_cs(current));
+ rcu_read_unlock();
+
+ if (same_cs)
+ return 0;
+
+ lockdep_assert_held(&cgroup_mutex);
+ percpu_down_write(&cpuset_rwsem);
+
+ /* Check to see if task is allowed in the cpuset */
+ ret = cpuset_can_attach_check(cs);
+ if (ret)
+ goto out_unlock;
+
+ ret = task_can_attach(task, cs->effective_cpus);
+ if (ret)
+ goto out_unlock;
+
+ ret = security_task_setscheduler(task);
+ if (ret)
+ goto out_unlock;
+
+ /*
+ * Mark attach is in progress. This makes validate_change() fail
+ * changes which zero cpus/mems_allowed.
+ */
+ cs->attach_in_progress++;
+out_unlock:
+ percpu_up_write(&cpuset_rwsem);
+ return ret;
+}
+
+static void cpuset_cancel_fork(struct task_struct *task, struct css_set *cset)
+{
+ struct cpuset *cs = css_cs(cset->subsys[cpuset_cgrp_id]);
+ bool same_cs;
+
+ rcu_read_lock();
+ same_cs = (cs == task_cs(current));
+ rcu_read_unlock();
+
+ if (same_cs)
+ return;
+
+ percpu_down_write(&cpuset_rwsem);
+ cs->attach_in_progress--;
+ if (!cs->attach_in_progress)
+ wake_up(&cpuset_attach_wq);
+ percpu_up_write(&cpuset_rwsem);
+}
+
/*
* Make sure the new task conform to the current state of its parent,
* which could have been changed by cpuset just after it inherits the
@@ -3292,6 +3360,11 @@ static void cpuset_fork(struct task_struct *task)
percpu_down_write(&cpuset_rwsem);
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
cpuset_attach_task(cs, task);
+
+ cs->attach_in_progress--;
+ if (!cs->attach_in_progress)
+ wake_up(&cpuset_attach_wq);
+
percpu_up_write(&cpuset_rwsem);
}
@@ -3305,6 +3378,8 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
.attach = cpuset_attach,
.post_attach = cpuset_post_attach,
.bind = cpuset_bind,
+ .can_fork = cpuset_can_fork,
+ .cancel_fork = cpuset_cancel_fork,
.fork = cpuset_fork,
.legacy_cftypes = legacy_files,
.dfl_cftypes = dfl_files,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x eee87853794187f6adbe19533ed79c8b44b36a91
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041703-wildland-privacy-e6d6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
eee878537941 ("cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods")
42a11bf5c543 ("cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly")
18f9a4d47527 ("cgroup/cpuset: Skip spread flags update on v2")
e2d59900d936 ("cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective")
18065ebe9b33 ("cgroup/cpuset: Miscellaneous cleanups & add helper functions")
f9da322e864e ("cgroup: cleanup comments")
8ca1b5a49885 ("mm/page_alloc: detect allocation forbidden by cpuset and bail out early")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From eee87853794187f6adbe19533ed79c8b44b36a91 Mon Sep 17 00:00:00 2001
From: Waiman Long <longman(a)redhat.com>
Date: Tue, 11 Apr 2023 09:35:59 -0400
Subject: [PATCH] cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork()
methods
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept
new tasks. It is too late to check that in cpuset_fork(). So we need
to add the cpuset_can_fork() and cpuset_cancel_fork() methods to
pre-check it before we can allow attachment to a different cpuset.
We also need to set the attach_in_progress flag to alert other code
that a new task is going to be added to the cpuset.
Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups")
Suggested-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: stable(a)vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2ccfae74acf9..166a45019f66 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2453,6 +2453,20 @@ static int fmeter_getrate(struct fmeter *fmp)
static struct cpuset *cpuset_attach_old_cs;
+/*
+ * Check to see if a cpuset can accept a new task
+ * For v1, cpus_allowed and mems_allowed can't be empty.
+ * For v2, effective_cpus can't be empty.
+ * Note that in v1, effective_cpus = cpus_allowed.
+ */
+static int cpuset_can_attach_check(struct cpuset *cs)
+{
+ if (cpumask_empty(cs->effective_cpus) ||
+ (!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
+ return -ENOSPC;
+ return 0;
+}
+
/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
@@ -2467,16 +2481,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
percpu_down_write(&cpuset_rwsem);
- /* allow moving tasks into an empty cpuset if on default hierarchy */
- ret = -ENOSPC;
- if (!is_in_v2_mode() &&
- (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
- goto out_unlock;
-
- /*
- * Task cannot be moved to a cpuset with empty effective cpus.
- */
- if (cpumask_empty(cs->effective_cpus))
+ /* Check to see if task is allowed in the cpuset */
+ ret = cpuset_can_attach_check(cs);
+ if (ret)
goto out_unlock;
cgroup_taskset_for_each(task, css, tset) {
@@ -2493,7 +2500,6 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
* changes which zero cpus/mems_allowed.
*/
cs->attach_in_progress++;
- ret = 0;
out_unlock:
percpu_up_write(&cpuset_rwsem);
return ret;
@@ -3264,6 +3270,68 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
percpu_up_write(&cpuset_rwsem);
}
+/*
+ * In case the child is cloned into a cpuset different from its parent,
+ * additional checks are done to see if the move is allowed.
+ */
+static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
+{
+ struct cpuset *cs = css_cs(cset->subsys[cpuset_cgrp_id]);
+ bool same_cs;
+ int ret;
+
+ rcu_read_lock();
+ same_cs = (cs == task_cs(current));
+ rcu_read_unlock();
+
+ if (same_cs)
+ return 0;
+
+ lockdep_assert_held(&cgroup_mutex);
+ percpu_down_write(&cpuset_rwsem);
+
+ /* Check to see if task is allowed in the cpuset */
+ ret = cpuset_can_attach_check(cs);
+ if (ret)
+ goto out_unlock;
+
+ ret = task_can_attach(task, cs->effective_cpus);
+ if (ret)
+ goto out_unlock;
+
+ ret = security_task_setscheduler(task);
+ if (ret)
+ goto out_unlock;
+
+ /*
+ * Mark attach is in progress. This makes validate_change() fail
+ * changes which zero cpus/mems_allowed.
+ */
+ cs->attach_in_progress++;
+out_unlock:
+ percpu_up_write(&cpuset_rwsem);
+ return ret;
+}
+
+static void cpuset_cancel_fork(struct task_struct *task, struct css_set *cset)
+{
+ struct cpuset *cs = css_cs(cset->subsys[cpuset_cgrp_id]);
+ bool same_cs;
+
+ rcu_read_lock();
+ same_cs = (cs == task_cs(current));
+ rcu_read_unlock();
+
+ if (same_cs)
+ return;
+
+ percpu_down_write(&cpuset_rwsem);
+ cs->attach_in_progress--;
+ if (!cs->attach_in_progress)
+ wake_up(&cpuset_attach_wq);
+ percpu_up_write(&cpuset_rwsem);
+}
+
/*
* Make sure the new task conform to the current state of its parent,
* which could have been changed by cpuset just after it inherits the
@@ -3292,6 +3360,11 @@ static void cpuset_fork(struct task_struct *task)
percpu_down_write(&cpuset_rwsem);
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
cpuset_attach_task(cs, task);
+
+ cs->attach_in_progress--;
+ if (!cs->attach_in_progress)
+ wake_up(&cpuset_attach_wq);
+
percpu_up_write(&cpuset_rwsem);
}
@@ -3305,6 +3378,8 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
.attach = cpuset_attach,
.post_attach = cpuset_post_attach,
.bind = cpuset_bind,
+ .can_fork = cpuset_can_fork,
+ .cancel_fork = cpuset_cancel_fork,
.fork = cpuset_fork,
.legacy_cftypes = legacy_files,
.dfl_cftypes = dfl_files,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d83806c4c0cccc0d6d3c3581a11983a9c186a138
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041747-monogram-playpen-8e77@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d83806c4c0cc ("purgatory: fix disabling debug info")
56e0790c7f9e ("RISC-V: add infrastructure to allow different str* implementations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d83806c4c0cccc0d6d3c3581a11983a9c186a138 Mon Sep 17 00:00:00 2001
From: Alyssa Ross <hi(a)alyssa.is>
Date: Sun, 26 Mar 2023 18:21:21 +0000
Subject: [PATCH] purgatory: fix disabling debug info
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index d16bf715a586..5730797a6b40 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -84,12 +84,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_ctype.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_ctype.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strcmp.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strlen.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strncmp.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 17f09dc26381..82fec66d46d2 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -69,8 +69,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x eee87853794187f6adbe19533ed79c8b44b36a91
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041702-vertebrae-bonsai-f1de@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
eee878537941 ("cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods")
42a11bf5c543 ("cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly")
18f9a4d47527 ("cgroup/cpuset: Skip spread flags update on v2")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From eee87853794187f6adbe19533ed79c8b44b36a91 Mon Sep 17 00:00:00 2001
From: Waiman Long <longman(a)redhat.com>
Date: Tue, 11 Apr 2023 09:35:59 -0400
Subject: [PATCH] cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork()
methods
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept
new tasks. It is too late to check that in cpuset_fork(). So we need
to add the cpuset_can_fork() and cpuset_cancel_fork() methods to
pre-check it before we can allow attachment to a different cpuset.
We also need to set the attach_in_progress flag to alert other code
that a new task is going to be added to the cpuset.
Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups")
Suggested-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: stable(a)vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2ccfae74acf9..166a45019f66 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2453,6 +2453,20 @@ static int fmeter_getrate(struct fmeter *fmp)
static struct cpuset *cpuset_attach_old_cs;
+/*
+ * Check to see if a cpuset can accept a new task
+ * For v1, cpus_allowed and mems_allowed can't be empty.
+ * For v2, effective_cpus can't be empty.
+ * Note that in v1, effective_cpus = cpus_allowed.
+ */
+static int cpuset_can_attach_check(struct cpuset *cs)
+{
+ if (cpumask_empty(cs->effective_cpus) ||
+ (!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
+ return -ENOSPC;
+ return 0;
+}
+
/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
@@ -2467,16 +2481,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
percpu_down_write(&cpuset_rwsem);
- /* allow moving tasks into an empty cpuset if on default hierarchy */
- ret = -ENOSPC;
- if (!is_in_v2_mode() &&
- (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
- goto out_unlock;
-
- /*
- * Task cannot be moved to a cpuset with empty effective cpus.
- */
- if (cpumask_empty(cs->effective_cpus))
+ /* Check to see if task is allowed in the cpuset */
+ ret = cpuset_can_attach_check(cs);
+ if (ret)
goto out_unlock;
cgroup_taskset_for_each(task, css, tset) {
@@ -2493,7 +2500,6 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
* changes which zero cpus/mems_allowed.
*/
cs->attach_in_progress++;
- ret = 0;
out_unlock:
percpu_up_write(&cpuset_rwsem);
return ret;
@@ -3264,6 +3270,68 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
percpu_up_write(&cpuset_rwsem);
}
+/*
+ * In case the child is cloned into a cpuset different from its parent,
+ * additional checks are done to see if the move is allowed.
+ */
+static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
+{
+ struct cpuset *cs = css_cs(cset->subsys[cpuset_cgrp_id]);
+ bool same_cs;
+ int ret;
+
+ rcu_read_lock();
+ same_cs = (cs == task_cs(current));
+ rcu_read_unlock();
+
+ if (same_cs)
+ return 0;
+
+ lockdep_assert_held(&cgroup_mutex);
+ percpu_down_write(&cpuset_rwsem);
+
+ /* Check to see if task is allowed in the cpuset */
+ ret = cpuset_can_attach_check(cs);
+ if (ret)
+ goto out_unlock;
+
+ ret = task_can_attach(task, cs->effective_cpus);
+ if (ret)
+ goto out_unlock;
+
+ ret = security_task_setscheduler(task);
+ if (ret)
+ goto out_unlock;
+
+ /*
+ * Mark attach is in progress. This makes validate_change() fail
+ * changes which zero cpus/mems_allowed.
+ */
+ cs->attach_in_progress++;
+out_unlock:
+ percpu_up_write(&cpuset_rwsem);
+ return ret;
+}
+
+static void cpuset_cancel_fork(struct task_struct *task, struct css_set *cset)
+{
+ struct cpuset *cs = css_cs(cset->subsys[cpuset_cgrp_id]);
+ bool same_cs;
+
+ rcu_read_lock();
+ same_cs = (cs == task_cs(current));
+ rcu_read_unlock();
+
+ if (same_cs)
+ return;
+
+ percpu_down_write(&cpuset_rwsem);
+ cs->attach_in_progress--;
+ if (!cs->attach_in_progress)
+ wake_up(&cpuset_attach_wq);
+ percpu_up_write(&cpuset_rwsem);
+}
+
/*
* Make sure the new task conform to the current state of its parent,
* which could have been changed by cpuset just after it inherits the
@@ -3292,6 +3360,11 @@ static void cpuset_fork(struct task_struct *task)
percpu_down_write(&cpuset_rwsem);
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
cpuset_attach_task(cs, task);
+
+ cs->attach_in_progress--;
+ if (!cs->attach_in_progress)
+ wake_up(&cpuset_attach_wq);
+
percpu_up_write(&cpuset_rwsem);
}
@@ -3305,6 +3378,8 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
.attach = cpuset_attach,
.post_attach = cpuset_post_attach,
.bind = cpuset_bind,
+ .can_fork = cpuset_can_fork,
+ .cancel_fork = cpuset_cancel_fork,
.fork = cpuset_fork,
.legacy_cftypes = legacy_files,
.dfl_cftypes = dfl_files,
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x d83806c4c0cccc0d6d3c3581a11983a9c186a138
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041746-daybed-posing-b660@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
d83806c4c0cc ("purgatory: fix disabling debug info")
56e0790c7f9e ("RISC-V: add infrastructure to allow different str* implementations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d83806c4c0cccc0d6d3c3581a11983a9c186a138 Mon Sep 17 00:00:00 2001
From: Alyssa Ross <hi(a)alyssa.is>
Date: Sun, 26 Mar 2023 18:21:21 +0000
Subject: [PATCH] purgatory: fix disabling debug info
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index d16bf715a586..5730797a6b40 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -84,12 +84,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_ctype.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_ctype.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strcmp.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strlen.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strncmp.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 17f09dc26381..82fec66d46d2 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -69,8 +69,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
During a suspend/resume cycle the VO power domain will be disabled and
the VOP2 registers will reset to their default values. After that the
cached register values will be out of sync and the read/modify/write
operations we do on the window registers will result in bogus values
written. Fix this by marking the regcache as dirty each time we disable
the VOP2 and call regcache_sync() each time we enable it again. With
this the VOP2 will show a picture after a suspend/resume cycle whereas
without this the screen stays dark.
Fixes: 604be85547ce4 ("drm/rockchip: Add VOP2 driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
---
Changes since v1:
- Use regcache_mark_dirty()/regcache_sync() instead of regmap_reinit_cache()
drivers/gpu/drm/rockchip/rockchip_drm_vop2.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
index ba3b817895091..293c228a83f90 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
@@ -839,6 +839,8 @@ static void vop2_enable(struct vop2 *vop2)
return;
}
+ regcache_sync(vop2->map);
+
if (vop2->data->soc_id == 3566)
vop2_writel(vop2, RK3568_OTP_WIN_EN, 1);
@@ -867,6 +869,8 @@ static void vop2_disable(struct vop2 *vop2)
pm_runtime_put_sync(vop2->dev);
+ regcache_mark_dirty(vop2->map);
+
clk_disable_unprepare(vop2->aclk);
clk_disable_unprepare(vop2->hclk);
}
--
2.39.2
Per-vcpu flags are updated using a non-atomic RMW operation.
Which means it is possible to get preempted between the read and
write operations.
Another interesting thing to note is that preemption also updates
flags, as we have some flag manipulation in both the load and put
operations.
It is thus possible to lose information communicated by either
load or put, as the preempted flag update will overwrite the flags
when the thread is resumed. This is specially critical if either
load or put has stored information which depends on the physical
CPU the vcpu runs on.
This results in really elusive bugs, and kudos must be given to
Mostafa for the long hours of debugging, and finally spotting
the problem.
Fixes: e87abb73e594 ("KVM: arm64: Add helpers to manipulate vcpu flags among a set")
Reported-by: Mostafa Saleh <smostafa(a)google.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
arch/arm64/include/asm/kvm_host.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bcd774d74f34..d716cfd806e8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -579,6 +579,19 @@ struct kvm_vcpu_arch {
v->arch.flagset & (m); \
})
+/*
+ * Note that the set/clear accessors must be preempt-safe in order to
+ * avoid nesting them with load/put which also manipulate flags...
+ */
+#ifdef __KVM_NVHE_HYPERVISOR__
+/* the nVHE hypervisor is always non-preemptible */
+#define __vcpu_flags_preempt_disable()
+#define __vcpu_flags_preempt_enable()
+#else
+#define __vcpu_flags_preempt_disable() preempt_disable()
+#define __vcpu_flags_preempt_enable() preempt_enable()
+#endif
+
#define __vcpu_set_flag(v, flagset, f, m) \
do { \
typeof(v->arch.flagset) *fset; \
@@ -586,9 +599,11 @@ struct kvm_vcpu_arch {
__build_check_flag(v, flagset, f, m); \
\
fset = &v->arch.flagset; \
+ __vcpu_flags_preempt_disable(); \
if (HWEIGHT(m) > 1) \
*fset &= ~(m); \
*fset |= (f); \
+ __vcpu_flags_preempt_enable(); \
} while (0)
#define __vcpu_clear_flag(v, flagset, f, m) \
@@ -598,7 +613,9 @@ struct kvm_vcpu_arch {
__build_check_flag(v, flagset, f, m); \
\
fset = &v->arch.flagset; \
+ __vcpu_flags_preempt_disable(); \
*fset &= ~(m); \
+ __vcpu_flags_preempt_enable(); \
} while (0)
#define vcpu_get_flag(v, ...) __vcpu_get_flag((v), __VA_ARGS__)
--
2.34.1