I'm announcing the release of the 4.19.138 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 -
arch/arm/include/asm/percpu.h | 2 +
drivers/char/random.c | 1
fs/ext4/inode.c | 5 ++
include/linux/prandom.h | 78 ++++++++++++++++++++++++++++++++++++++++++
include/linux/random.h | 63 ++-------------------------------
kernel/time/timer.c | 8 ++++
lib/random32.c | 2 -
8 files changed, 100 insertions(+), 61 deletions(-)
Greg Kroah-Hartman (1):
Linux 4.19.138
Grygorii Strashko (1):
ARM: percpu.h: fix build error
Jiang Ying (1):
ext4: fix direct I/O read error
Linus Torvalds (2):
random32: remove net_rand_state from the latent entropy gcc plugin
random32: move the pseudo-random 32-bit definitions to prandom.h
Willy Tarreau (2):
random32: update the net random state on interrupt and activity
random: fix circular include dependency on arm64 after addition of percpu.h
I'm announcing the release of the 4.14.193 kernel.
All users of the 4.14 kernel series must upgrade.
The updated 4.14.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.14.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/include/asm/percpu.h | 2
arch/arm/kernel/head-common.S | 1
drivers/char/random.c | 1
drivers/scsi/libsas/sas_ata.c | 1
drivers/scsi/libsas/sas_discover.c | 32 ++++++---------
drivers/scsi/libsas/sas_expander.c | 8 ++-
drivers/scsi/libsas/sas_internal.h | 1
drivers/scsi/libsas/sas_port.c | 3 -
fs/ext4/inode.c | 5 ++
include/linux/prandom.h | 78 +++++++++++++++++++++++++++++++++++++
include/linux/random.h | 63 +----------------------------
include/scsi/libsas.h | 3 -
include/scsi/scsi_transport_sas.h | 1
kernel/time/timer.c | 8 +++
lib/random32.c | 2
16 files changed, 123 insertions(+), 88 deletions(-)
Geert Uytterhoeven (1):
ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()
Greg Kroah-Hartman (2):
Revert "scsi: libsas: direct call probe and destruct"
Linux 4.14.193
Grygorii Strashko (1):
ARM: percpu.h: fix build error
Jiang Ying (1):
ext4: fix direct I/O read error
Linus Torvalds (2):
random32: remove net_rand_state from the latent entropy gcc plugin
random32: move the pseudo-random 32-bit definitions to prandom.h
Willy Tarreau (2):
random32: update the net random state on interrupt and activity
random: fix circular include dependency on arm64 after addition of percpu.h
This is the start of the stable review cycle for the 5.7.14 release.
There are 7 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 Aug 2020 19:59:06 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.14-rc2…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.7.14-rc2
Linus Torvalds <torvalds(a)linux-foundation.org>
random: random.h should include archrandom.h, not the other way around
Marc Zyngier <maz(a)kernel.org>
arm64: Workaround circular dependency in pointer_auth.h
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: move the pseudo-random 32-bit definitions to prandom.h
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: remove net_rand_state from the latent entropy gcc plugin
Willy Tarreau <w(a)1wt.eu>
random: fix circular include dependency on arm64 after addition of percpu.h
Grygorii Strashko <grygorii.strashko(a)ti.com>
ARM: percpu.h: fix build error
Willy Tarreau <w(a)1wt.eu>
random32: update the net random state on interrupt and activity
-------------
Diffstat:
Makefile | 4 +-
arch/arm/include/asm/percpu.h | 2 +
arch/arm64/include/asm/archrandom.h | 1 -
arch/arm64/include/asm/pointer_auth.h | 8 +++-
arch/arm64/kernel/kaslr.c | 2 +-
drivers/char/random.c | 1 +
include/linux/prandom.h | 78 +++++++++++++++++++++++++++++++++++
include/linux/random.h | 63 ++--------------------------
kernel/time/timer.c | 8 ++++
lib/random32.c | 2 +-
10 files changed, 104 insertions(+), 65 deletions(-)
This is the start of the stable review cycle for the 4.19.138 release.
There are 6 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 Aug 2020 15:34:53 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.138-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.138-rc1
Jiang Ying <jiangying8582(a)126.com>
ext4: fix direct I/O read error
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: move the pseudo-random 32-bit definitions to prandom.h
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: remove net_rand_state from the latent entropy gcc plugin
Willy Tarreau <w(a)1wt.eu>
random: fix circular include dependency on arm64 after addition of percpu.h
Grygorii Strashko <grygorii.strashko(a)ti.com>
ARM: percpu.h: fix build error
Willy Tarreau <w(a)1wt.eu>
random32: update the net random state on interrupt and activity
-------------
Diffstat:
Makefile | 4 +--
arch/arm/include/asm/percpu.h | 2 ++
drivers/char/random.c | 1 +
fs/ext4/inode.c | 5 +++
include/linux/prandom.h | 78 +++++++++++++++++++++++++++++++++++++++++++
include/linux/random.h | 63 +++-------------------------------
kernel/time/timer.c | 8 +++++
lib/random32.c | 2 +-
8 files changed, 101 insertions(+), 62 deletions(-)
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: khugepaged_test_exit() check mmget_still_valid()
Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself. collapse_huge_page() is used for anon THP
only, and earned its mmget_still_valid() check because it inserts a huge
pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry. But core dumping without mmap
lock must have been as open to mistaking a racily cleared pmd entry for a
page table at physical page 0, as exit_mmap() was. And we certainly have
no interest in mapping as a THP once dumping core.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils
Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
--- a/mm/khugepaged.c~khugepaged-khugepaged_test_exit-check-mmget_still_valid
+++ a/mm/khugepaged.c
@@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(stru
static inline int khugepaged_test_exit(struct mm_struct *mm)
{
- return atomic_read(&mm->mm_users) == 0;
+ return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm);
}
static bool hugepage_vma_check(struct vm_area_struct *vma,
@@ -1100,9 +1100,6 @@ static void collapse_huge_page(struct mm
* handled by the anon_vma lock + PG_lock.
*/
mmap_write_lock(mm);
- result = SCAN_ANY_PROCESS;
- if (!mmget_still_valid(mm))
- goto out;
result = hugepage_vma_revalidate(mm, address, &vma);
if (result)
goto out;
_
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
--- a/mm/khugepaged.c~khugepaged-retract_page_tables-remember-to-test-exit
+++ a/mm/khugepaged.c
@@ -1532,6 +1532,7 @@ out:
static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
{
struct vm_area_struct *vma;
+ struct mm_struct *mm;
unsigned long addr;
pmd_t *pmd, _pmd;
@@ -1560,7 +1561,8 @@ static void retract_page_tables(struct a
continue;
if (vma->vm_end < addr + HPAGE_PMD_SIZE)
continue;
- pmd = mm_find_pmd(vma->vm_mm, addr);
+ mm = vma->vm_mm;
+ pmd = mm_find_pmd(mm, addr);
if (!pmd)
continue;
/*
@@ -1570,17 +1572,19 @@ static void retract_page_tables(struct a
* mmap_lock while holding page lock. Fault path does it in
* reverse order. Trylock is a way to avoid deadlock.
*/
- if (mmap_write_trylock(vma->vm_mm)) {
- spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
- /* assume page table is clear */
- _pmd = pmdp_collapse_flush(vma, addr, pmd);
- spin_unlock(ptl);
- mmap_write_unlock(vma->vm_mm);
- mm_dec_nr_ptes(vma->vm_mm);
- pte_free(vma->vm_mm, pmd_pgtable(_pmd));
+ if (mmap_write_trylock(mm)) {
+ if (!khugepaged_test_exit(mm)) {
+ spinlock_t *ptl = pmd_lock(mm, pmd);
+ /* assume page table is clear */
+ _pmd = pmdp_collapse_flush(vma, addr, pmd);
+ spin_unlock(ptl);
+ mm_dec_nr_ptes(mm);
+ pte_free(mm, pmd_pgtable(_pmd));
+ }
+ mmap_write_unlock(mm);
} else {
/* Try again later */
- khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+ khugepaged_add_pte_mapped_thp(mm, addr);
}
}
i_mmap_unlock_write(mapping);
_
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 44 +++++++++++++++++++-------------------------
1 file changed, 19 insertions(+), 25 deletions(-)
--- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock
+++ a/mm/khugepaged.c
@@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s
{
unsigned long haddr = addr & HPAGE_PMD_MASK;
struct vm_area_struct *vma = find_vma(mm, haddr);
- struct page *hpage = NULL;
+ struct page *hpage;
pte_t *start_pte, *pte;
pmd_t *pmd, _pmd;
spinlock_t *ptl;
@@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s
if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
return;
+ hpage = find_lock_page(vma->vm_file->f_mapping,
+ linear_page_index(vma, haddr));
+ if (!hpage)
+ return;
+
+ if (!PageHead(hpage))
+ goto drop_hpage;
+
pmd = mm_find_pmd(mm, haddr);
if (!pmd)
- return;
+ goto drop_hpage;
start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
@@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s
page = vm_normal_page(vma, addr, *pte);
- if (!page || !PageCompound(page))
- goto abort;
-
- if (!hpage) {
- hpage = compound_head(page);
- /*
- * The mapping of the THP should not change.
- *
- * Note that uprobe, debugger, or MAP_PRIVATE may
- * change the page table, but the new page will
- * not pass PageCompound() check.
- */
- if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping))
- goto abort;
- }
-
/*
- * Confirm the page maps to the correct subpage.
- *
- * Note that uprobe, debugger, or MAP_PRIVATE may change
- * the page table, but the new page will not pass
- * PageCompound() check.
+ * Note that uprobe, debugger, or MAP_PRIVATE may change the
+ * page table, but the new page will not be a subpage of hpage.
*/
- if (WARN_ON(hpage + i != page))
+ if (hpage + i != page)
goto abort;
count++;
}
@@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s
pte_unmap_unlock(start_pte, ptl);
/* step 3: set proper refcount and mm_counters. */
- if (hpage) {
+ if (count) {
page_ref_sub(hpage, count);
add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count);
}
@@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s
spin_unlock(ptl);
mm_dec_nr_ptes(mm);
pte_free(mm, pmd_pgtable(_pmd));
+
+drop_hpage:
+ unlock_page(hpage);
+ put_page(hpage);
return;
abort:
pte_unmap_unlock(start_pte, ptl);
+ goto drop_hpage;
}
static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)
_
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: collapse_pte_mapped_thp() flush the right range
pmdp_collapse_flush() should be given the start address at which the huge
page is mapped, haddr: it was given addr, which at that point has been
used as a local variable, incremented to the end address of the extent.
Found by source inspection while chasing a hugepage locking bug, which I
then could not explain by this. At first I thought this was very bad;
then saw that all of the page translations that were not flushed would
actually still point to the right pages afterwards, so harmless; then
realized that I know nothing of how different architectures and models
cache intermediate paging structures, so maybe it matters after all -
particularly since the page table concerned is immediately freed.
Much easier to fix than to think about.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-flush-the-right-range
+++ a/mm/khugepaged.c
@@ -1502,7 +1502,7 @@ void collapse_pte_mapped_thp(struct mm_s
/* step 4: collapse pmd */
ptl = pmd_lock(vma->vm_mm, pmd);
- _pmd = pmdp_collapse_flush(vma, addr, pmd);
+ _pmd = pmdp_collapse_flush(vma, haddr, pmd);
spin_unlock(ptl);
mm_dec_nr_ptes(mm);
pte_free(mm, pmd_pgtable(_pmd));
_
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
This is found by code observation only.
Firstly, the worst case scenario should assume the whole range was covered
by pmd sharing. The old algorithm might not work as expected for ranges
like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
expected range should be (0, 2g).
Since at it, remove the loop since it should not be required. With that,
the new code should be faster too when the invalidating range is huge.
Mike said:
: With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only
: adjust to (0, 1g+2m) which is incorrect.
:
: We should cc stable. The original reason for adjusting the range was to
: prevent data corruption (getting wrong page). Since the range is not
: always adjusted correctly, the potential for corruption still exists.
:
: However, I am fairly confident that adjust_range_if_pmd_sharing_possible
: is only gong to be called in two cases:
:
: 1) for a single page
: 2) for range == entire vma
:
: In those cases, the current code should produce the correct results.
:
: To be safe, let's just cc stable.
Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com
Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-calculation-of-adjust_range_if_pmd_sharing_possible
+++ a/mm/hugetlb.c
@@ -5314,25 +5314,21 @@ static bool vma_shareable(struct vm_area
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end)
{
- unsigned long check_addr;
+ unsigned long a_start, a_end;
if (!(vma->vm_flags & VM_MAYSHARE))
return;
- for (check_addr = *start; check_addr < *end; check_addr += PUD_SIZE) {
- unsigned long a_start = check_addr & PUD_MASK;
- unsigned long a_end = a_start + PUD_SIZE;
+ /* Extend the range to be PUD aligned for a worst case scenario */
+ a_start = ALIGN_DOWN(*start, PUD_SIZE);
+ a_end = ALIGN(*end, PUD_SIZE);
- /*
- * If sharing is possible, adjust start/end if necessary.
- */
- if (range_in_vma(vma, a_start, a_end)) {
- if (a_start < *start)
- *start = a_start;
- if (a_end > *end)
- *end = a_end;
- }
- }
+ /*
+ * Intersect the range with the vma range, since pmd sharing won't be
+ * across vma after all
+ */
+ *start = max(vma->vm_start, a_start);
+ *end = min(vma->vm_end, a_end);
}
/*
_
From: Michal Koutný <mkoutny(a)suse.com>
Subject: mm/page_counter.c: fix protection usage propagation
When workload runs in cgroups that aren't directly below root cgroup and
their parent specifies reclaim protection, it may end up ineffective.
The reason is that propagate_protected_usage() is not called in all
hierarchy up. All the protected usage is incorrectly accumulated in the
workload's parent. This means that siblings_low_usage is overestimated
and effective protection underestimated. Even though it is transitional
phenomenon (uncharge path does correct propagation and fixes the wrong
children_low_usage), it can undermine the intended protection
unexpectedly.
We have noticed this problem while seeing a swap out in a descendant of a
protected memcg (intermediate node) while the parent was conveniently
under its protection limit and the memory pressure was external to that
hierarchy. Michal has pinpointed this down to the wrong
siblings_low_usage which led to the unwanted reclaim.
The fix is simply updating children_low_usage in respective ancestors also
in the charging path.
Link: http://lkml.kernel.org/r/20200803153231.15477-1-mhocko@kernel.org
Fixes: 230671533d64 ("mm: memory.low hierarchical behavior")
Signed-off-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.18+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_counter.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/page_counter.c~mm-fix-protection-usage-propagation
+++ a/mm/page_counter.c
@@ -72,7 +72,7 @@ void page_counter_charge(struct page_cou
long new;
new = atomic_long_add_return(nr_pages, &c->usage);
- propagate_protected_usage(counter, new);
+ propagate_protected_usage(c, new);
/*
* This is indeed racy, but we can live with some
* inaccuracy in the watermark.
@@ -116,7 +116,7 @@ bool page_counter_try_charge(struct page
new = atomic_long_add_return(nr_pages, &c->usage);
if (new > c->max) {
atomic_long_sub(nr_pages, &c->usage);
- propagate_protected_usage(counter, new);
+ propagate_protected_usage(c, new);
/*
* This is racy, but we can live with some
* inaccuracy in the failcnt.
@@ -125,7 +125,7 @@ bool page_counter_try_charge(struct page
*fail = c;
goto failed;
}
- propagate_protected_usage(counter, new);
+ propagate_protected_usage(c, new);
/*
* Just like with failcnt, we can live with some
* inaccuracy in the watermark.
_
From: Peter Zijlstra <peterz(a)infradead.org>
Subject: mm: fix kthread_use_mm() vs TLB invalidate
For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:
CPU0 CPU1
flush_tlb_mm(mm) use_mm(mm)
<send-IPI>
tsk->active_mm = mm;
<IPI>
if (tsk->active_mm == mm)
// flush TLBs
</IPI>
switch_mm(old_mm,mm,tsk);
Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.
Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().
Of the (SMP) architectures that have IPI based TLB invalidate:
Alpha - checks active_mm
ARC - ASID specific
IA64 - checks active_mm
MIPS - ASID specific flush
OpenRISC - shoots down world
PARISC - shoots down world
SH - ASID specific
SPARC - ASID specific
x86 - N/A
xtensa - checks active_mm
So at the very least Alpha, IA64 and Xtensa are suspect.
On top of this, for scheduler consistency we need at least preemption
disabled across changing tsk->mm and doing switch_mm(), which is
currently provided by task_lock(), but that's not sufficient for
PREEMPT_RT.
[akpm(a)linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/20200721154106.GE10769@hirez.programming.kicks-ass…
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reported-by: Andy Lutomirski <luto(a)amacapital.net>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Jann Horn <jannh(a)google.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kthread.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/kernel/kthread.c~mm-fix-kthread_use_mm-vs-tlb-invalidate
+++ a/kernel/kthread.c
@@ -1241,13 +1241,16 @@ void kthread_use_mm(struct mm_struct *mm
WARN_ON_ONCE(tsk->mm);
task_lock(tsk);
+ /* Hold off tlb flush IPIs while switching mm's */
+ local_irq_disable();
active_mm = tsk->active_mm;
if (active_mm != mm) {
mmgrab(mm);
tsk->active_mm = mm;
}
tsk->mm = mm;
- switch_mm(active_mm, mm, tsk);
+ switch_mm_irqs_off(active_mm, mm, tsk);
+ local_irq_enable();
task_unlock(tsk);
#ifdef finish_arch_post_lock_switch
finish_arch_post_lock_switch();
@@ -1276,9 +1279,11 @@ void kthread_unuse_mm(struct mm_struct *
task_lock(tsk);
sync_mm_rss(mm);
+ local_irq_disable();
tsk->mm = NULL;
/* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk);
+ local_irq_enable();
task_unlock(tsk);
}
EXPORT_SYMBOL_GPL(kthread_unuse_mm);
_
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/shuffle: don't move pages between zones and don't read garbage memmaps
Especially with memory hotplug, we can have offline sections (with a
garbage memmap) and overlapping zones. We have to make sure to only touch
initialized memmaps (online sections managed by the buddy) and that the
zone matches, to not move pages between zones.
To test if this can actually happen, I added a simple
BUG_ON(page_zone(page_i) != page_zone(page_j));
right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and
onlining the first memory block "online_movable" and the second memory
block "online_kernel", it will trigger the BUG, as both zones (NORMAL and
MOVABLE) overlap.
This might result in all kinds of weird situations (e.g., double
allocations, list corruptions, unmovable allocations ending up in the
movable zone).
Link: http://lkml.kernel.org/r/20200624094741.9918-2-david@redhat.com
Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Wei Yang <richard.weiyang(a)linux.alibaba.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org> [5.2+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shuffle.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
--- a/mm/shuffle.c~mm-shuffle-dont-move-pages-between-zones-and-dont-read-garbage-memmaps
+++ a/mm/shuffle.c
@@ -58,25 +58,25 @@ module_param_call(shuffle, shuffle_store
* For two pages to be swapped in the shuffle, they must be free (on a
* 'free_area' lru), have the same order, and have the same migratetype.
*/
-static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
+static struct page * __meminit shuffle_valid_page(struct zone *zone,
+ unsigned long pfn, int order)
{
- struct page *page;
+ struct page *page = pfn_to_online_page(pfn);
/*
* Given we're dealing with randomly selected pfns in a zone we
* need to ask questions like...
*/
- /* ...is the pfn even in the memmap? */
- if (!pfn_valid_within(pfn))
+ /* ... is the page managed by the buddy? */
+ if (!page)
return NULL;
- /* ...is the pfn in a present section or a hole? */
- if (!pfn_in_present_section(pfn))
+ /* ... is the page assigned to the same zone? */
+ if (page_zone(page) != zone)
return NULL;
/* ...is the page free and currently on a free_area list? */
- page = pfn_to_page(pfn);
if (!PageBuddy(page))
return NULL;
@@ -123,7 +123,7 @@ void __meminit __shuffle_zone(struct zon
* page_j randomly selected in the span @zone_start_pfn to
* @spanned_pages.
*/
- page_i = shuffle_valid_page(i, order);
+ page_i = shuffle_valid_page(z, i, order);
if (!page_i)
continue;
@@ -137,7 +137,7 @@ void __meminit __shuffle_zone(struct zon
j = z->zone_start_pfn +
ALIGN_DOWN(get_random_long() % z->spanned_pages,
order_pages);
- page_j = shuffle_valid_page(j, order);
+ page_j = shuffle_valid_page(z, j, order);
if (page_j && page_j != page_i)
break;
}
_
On 2020-03-20 12:58, tip-bot2 for Peter Zijlstra wrote:
> The following commit has been merged into the perf/core branch of tip:
>
> Commit-ID: 90c91dfb86d0ff545bd329d3ddd72c147e2ae198
> Gitweb: https://git.kernel.org/tip/90c91dfb86d0ff545bd329d3ddd72c147e2ae198
> Author: Peter Zijlstra <peterz(a)infradead.org>
> AuthorDate: Thu, 05 Mar 2020 13:38:51 +01:00
> Committer: Peter Zijlstra <peterz(a)infradead.org>
> CommitterDate: Fri, 20 Mar 2020 13:06:22 +01:00
>
> perf/core: Fix endless multiplex timer
>
> Kan and Andi reported that we fail to kill rotation when the flexible
> events go empty, but the context does not. XXX moar
>
> Fixes: fd7d55172d1e ("perf/cgroups: Don't rotate events for cgroups unnecessarily")
Can this patch (commit 90c91dfb86d0 ("perf/core: Fix endless multiplex
timer") upstream) be applied to stable please? For PMU drivers built as
modules, the bug can actually kill the system, since the runaway hrtimer
loop keeps calling pmu->{enable,disable} after all the events have been
closed and dropped their references to pmu->module. Thus legitimately
unloading the module once things have got into this state quickly
results in a crash when those callbacks disappear.
(FWIW I spent about two days fighting with this while testing a new
driver as a module against the 5.3 kernel installed on someone else's
machine, assuming it was a bug in my code...)
Robin.
> Reported-by: Andi Kleen <ak(a)linux.intel.com>
> Reported-by: Kan Liang <kan.liang(a)linux.intel.com>
> Tested-by: Kan Liang <kan.liang(a)linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
> Link: https://lkml.kernel.org/r/20200305123851.GX2596@hirez.programming.kicks-ass…
> ---
> kernel/events/core.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index ccf8d4f..b5a68d2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2291,6 +2291,7 @@ __perf_remove_from_context(struct perf_event *event,
>
> if (!ctx->nr_events && ctx->is_active) {
> ctx->is_active = 0;
> + ctx->rotate_necessary = 0;
> if (ctx->task) {
> WARN_ON_ONCE(cpuctx->task_ctx != ctx);
> cpuctx->task_ctx = NULL;
> @@ -3188,12 +3189,6 @@ static void ctx_sched_out(struct perf_event_context *ctx,
> if (!ctx->nr_active || !(is_active & EVENT_ALL))
> return;
>
> - /*
> - * If we had been multiplexing, no rotations are necessary, now no events
> - * are active.
> - */
> - ctx->rotate_necessary = 0;
> -
> perf_pmu_disable(ctx->pmu);
> if (is_active & EVENT_PINNED) {
> list_for_each_entry_safe(event, tmp, &ctx->pinned_active, active_list)
> @@ -3203,6 +3198,13 @@ static void ctx_sched_out(struct perf_event_context *ctx,
> if (is_active & EVENT_FLEXIBLE) {
> list_for_each_entry_safe(event, tmp, &ctx->flexible_active, active_list)
> group_sched_out(event, cpuctx, ctx);
> +
> + /*
> + * Since we cleared EVENT_FLEXIBLE, also clear
> + * rotate_necessary, is will be reset by
> + * ctx_flexible_sched_in() when needed.
> + */
> + ctx->rotate_necessary = 0;
> }
> perf_pmu_enable(ctx->pmu);
> }
> @@ -3985,6 +3987,12 @@ ctx_event_to_rotate(struct perf_event_context *ctx)
> typeof(*event), group_node);
> }
>
> + /*
> + * Unconditionally clear rotate_necessary; if ctx_flexible_sched_in()
> + * finds there are unschedulable events, it will set it again.
> + */
> + ctx->rotate_necessary = 0;
> +
> return event;
> }
>
>
When running `make coccicheck` in report mode using the
add_namespace.cocci file, it will fail for files that contain
MODULE_LICENSE. Those match the replacement precondition, but spatch
errors out as virtual.ns is not set.
In order to fix that, add the virtual rule nsdeps and only do search and
replace if that rule has been explicitly requested.
In order to make spatch happy in report mode, we also need a dummy rule,
as otherwise it errors out with "No rules apply". Using a script:python
rule appears unrelated and odd, but this is the shortest I could come up
with.
Adjust scripts/nsdeps accordingly to set the nsdeps rule when run trough
`make nsdeps`.
Suggested-by: Julia Lawall <julia.lawall(a)inria.fr>
Fixes: c7c4e29fb5a4 ("scripts: add_namespace: Fix coccicheck failed")
Cc: YueHaibing <yuehaibing(a)huawei.com>
Cc: jeyu(a)kernel.org
Cc: cocci(a)systeme.lip6.fr
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthias Maennich <maennich(a)google.com>
---
scripts/coccinelle/misc/add_namespace.cocci | 8 +++++++-
scripts/nsdeps | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/scripts/coccinelle/misc/add_namespace.cocci b/scripts/coccinelle/misc/add_namespace.cocci
index 99e93a6c2e24..cbf1614163cb 100644
--- a/scripts/coccinelle/misc/add_namespace.cocci
+++ b/scripts/coccinelle/misc/add_namespace.cocci
@@ -6,6 +6,7 @@
/// add a missing namespace tag to a module source file.
///
+virtual nsdeps
virtual report
@has_ns_import@
@@ -16,10 +17,15 @@ MODULE_IMPORT_NS(ns);
// Add missing imports, but only adjacent to a MODULE_LICENSE statement.
// That ensures we are adding it only to the main module source file.
-@do_import depends on !has_ns_import@
+@do_import depends on !has_ns_import && nsdeps@
declarer name MODULE_LICENSE;
expression license;
identifier virtual.ns;
@@
MODULE_LICENSE(license);
+ MODULE_IMPORT_NS(ns);
+
+// Dummy rule for report mode that would otherwise be empty and make spatch
+// fail ("No rules apply.")
+@script:python depends on report@
+@@
diff --git a/scripts/nsdeps b/scripts/nsdeps
index 03a8e7cbe6c7..dab4c1a0e27d 100644
--- a/scripts/nsdeps
+++ b/scripts/nsdeps
@@ -29,7 +29,7 @@ fi
generate_deps_for_ns() {
$SPATCH --very-quiet --in-place --sp-file \
- $srctree/scripts/coccinelle/misc/add_namespace.cocci -D ns=$1 $2
+ $srctree/scripts/coccinelle/misc/add_namespace.cocci -D nsdeps -D ns=$1 $2
}
generate_deps() {
--
2.27.0.rc2.251.g90737beb825-goog
This is the start of the stable review cycle for the 4.14.193 release.
There are 8 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 Aug 2020 15:34:53 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.193-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.193-rc1
Geert Uytterhoeven <geert(a)linux-m68k.org>
ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()
Jiang Ying <jiangying8582(a)126.com>
ext4: fix direct I/O read error
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: move the pseudo-random 32-bit definitions to prandom.h
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: remove net_rand_state from the latent entropy gcc plugin
Willy Tarreau <w(a)1wt.eu>
random: fix circular include dependency on arm64 after addition of percpu.h
Grygorii Strashko <grygorii.strashko(a)ti.com>
ARM: percpu.h: fix build error
Willy Tarreau <w(a)1wt.eu>
random32: update the net random state on interrupt and activity
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "scsi: libsas: direct call probe and destruct"
-------------
Diffstat:
Makefile | 4 +-
arch/arm/include/asm/percpu.h | 2 +
arch/arm/kernel/head-common.S | 1 +
drivers/char/random.c | 1 +
drivers/scsi/libsas/sas_ata.c | 1 +
drivers/scsi/libsas/sas_discover.c | 32 +++++++---------
drivers/scsi/libsas/sas_expander.c | 8 ++--
drivers/scsi/libsas/sas_internal.h | 1 -
drivers/scsi/libsas/sas_port.c | 3 --
fs/ext4/inode.c | 5 +++
include/linux/prandom.h | 78 ++++++++++++++++++++++++++++++++++++++
include/linux/random.h | 63 ++----------------------------
include/scsi/libsas.h | 3 +-
include/scsi/scsi_transport_sas.h | 1 -
kernel/time/timer.c | 8 ++++
lib/random32.c | 2 +-
16 files changed, 124 insertions(+), 89 deletions(-)
This is the start of the stable review cycle for the 5.4.57 release.
There are 9 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 Aug 2020 15:34:53 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.57-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.57-rc1
Lorenz Bauer <lmb(a)cloudflare.com>
bpf: sockmap: Require attach_bpf_fd when detaching a program
Lorenz Bauer <lmb(a)cloudflare.com>
selftests: bpf: Fix detach from sockmap tests
Jiang Ying <jiangying8582(a)126.com>
ext4: fix direct I/O read error
Marc Zyngier <maz(a)kernel.org>
arm64: Workaround circular dependency in pointer_auth.h
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: move the pseudo-random 32-bit definitions to prandom.h
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: remove net_rand_state from the latent entropy gcc plugin
Willy Tarreau <w(a)1wt.eu>
random: fix circular include dependency on arm64 after addition of percpu.h
Grygorii Strashko <grygorii.strashko(a)ti.com>
ARM: percpu.h: fix build error
Willy Tarreau <w(a)1wt.eu>
random32: update the net random state on interrupt and activity
-------------
Diffstat:
Makefile | 4 +-
arch/arm/include/asm/percpu.h | 2 +
arch/arm64/include/asm/pointer_auth.h | 8 +++-
drivers/char/random.c | 1 +
fs/ext4/inode.c | 5 +++
include/linux/bpf.h | 13 +++++-
include/linux/prandom.h | 78 +++++++++++++++++++++++++++++++++
include/linux/random.h | 63 ++------------------------
include/linux/skmsg.h | 13 ++++++
kernel/bpf/syscall.c | 4 +-
kernel/time/timer.c | 8 ++++
lib/random32.c | 2 +-
net/core/sock_map.c | 50 ++++++++++++++++++---
tools/testing/selftests/bpf/test_maps.c | 12 ++---
14 files changed, 185 insertions(+), 78 deletions(-)
If we hit an earlier error path in io_uring_create(), then we will have
accounted memory, but not set ctx->{sq,cq}_entries yet. Then when the
ring is torn down in error, we use those values to unaccount the memory.
Ensure we set the ctx entries before we're able to hit a potential error
path.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
fs/io_uring.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8f96566603f3..0d857f7ca507 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8193,6 +8193,10 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx,
struct io_rings *rings;
size_t size, sq_array_offset;
+ /* make sure these are sane, as we already accounted them */
+ ctx->sq_entries = p->sq_entries;
+ ctx->cq_entries = p->cq_entries;
+
size = rings_size(p->sq_entries, p->cq_entries, &sq_array_offset);
if (size == SIZE_MAX)
return -EOVERFLOW;
@@ -8209,8 +8213,6 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx,
rings->cq_ring_entries = p->cq_entries;
ctx->sq_mask = rings->sq_ring_mask;
ctx->cq_mask = rings->cq_ring_mask;
- ctx->sq_entries = rings->sq_ring_entries;
- ctx->cq_entries = rings->cq_ring_entries;
size = array_size(sizeof(struct io_uring_sqe), p->sq_entries);
if (size == SIZE_MAX) {
--
2.28.0
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
On exit, if a process is preempted after the trace_sched_process_exit()
tracepoint but before the process is done exiting, then when it gets
scheduled in, the function tracers will not filter it properly against the
function tracing pid filters.
That is because the function tracing pid filters hooks to the
sched_process_exit() tracepoint to remove the exiting task's pid from the
filter list. Because the filtering happens at the sched_switch tracepoint,
when the exiting task schedules back in to finish up the exit, it will no
longer be in the function pid filtering tables.
This was noticeable in the notrace self tests on a preemptable kernel, as
the tests would fail as it exits and preempted after being taken off the
notrace filter table and on scheduling back in it would not be in the
notrace list, and then the ending of the exit function would trace. The test
detected this and would fail.
Cc: stable(a)vger.kernel.org
Cc: Namhyung Kim <namhyung(a)kernel.org>
Fixes: 1e10486ffee0a ("ftrace: Add 'function-fork' trace option")
Fixes: c37775d57830a ("tracing: Add infrastructure to allow set_event_pid to follow children"
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 4 ++--
kernel/trace/trace_events.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4e3a5d79c078..76f2dd6fd414 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6985,12 +6985,12 @@ void ftrace_pid_follow_fork(struct trace_array *tr, bool enable)
if (enable) {
register_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
tr);
- register_trace_sched_process_exit(ftrace_pid_follow_sched_process_exit,
+ register_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
tr);
} else {
unregister_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
tr);
- unregister_trace_sched_process_exit(ftrace_pid_follow_sched_process_exit,
+ unregister_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
tr);
}
}
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f6f55682d3e2..a85effb2373b 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -538,12 +538,12 @@ void trace_event_follow_fork(struct trace_array *tr, bool enable)
if (enable) {
register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork,
tr, INT_MIN);
- register_trace_prio_sched_process_exit(event_filter_pid_sched_process_exit,
+ register_trace_prio_sched_process_free(event_filter_pid_sched_process_exit,
tr, INT_MAX);
} else {
unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork,
tr);
- unregister_trace_sched_process_exit(event_filter_pid_sched_process_exit,
+ unregister_trace_sched_process_free(event_filter_pid_sched_process_exit,
tr);
}
}
--
2.26.2
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Since the parse_args() stops parsing at '--', bootconfig_params()
will never get the '--' as param and initargs_found never be true.
In the result, if we pass some init arguments via the bootconfig,
those are always appended to the kernel command line with '--'
even if the kernel command line already has '--'.
To fix this correctly, check the return value of parse_args()
and set initargs_found true if the return value is not an error
but a valid address.
Link: https://lkml.kernel.org/r/159650953285.270383.14822353843556363851.stgit@de…
Fixes: f61872bb58a1 ("bootconfig: Use parse_args() to find bootconfig and '--'")
Cc: stable(a)vger.kernel.org
Reported-by: Arvind Sankar <nivedita(a)alum.mit.edu>
Suggested-by: Arvind Sankar <nivedita(a)alum.mit.edu>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
init/main.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/init/main.c b/init/main.c
index 0ead83e86b5a..883ded3638e5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -387,8 +387,6 @@ static int __init bootconfig_params(char *param, char *val,
{
if (strcmp(param, "bootconfig") == 0) {
bootconfig_found = true;
- } else if (strcmp(param, "--") == 0) {
- initargs_found = true;
}
return 0;
}
@@ -399,19 +397,23 @@ static void __init setup_boot_config(const char *cmdline)
const char *msg;
int pos;
u32 size, csum;
- char *data, *copy;
+ char *data, *copy, *err;
int ret;
/* Cut out the bootconfig data even if we have no bootconfig option */
data = get_boot_config_from_initrd(&size, &csum);
strlcpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
- parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
- bootconfig_params);
+ err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
+ bootconfig_params);
- if (!bootconfig_found)
+ if (IS_ERR(err) || !bootconfig_found)
return;
+ /* parse_args() stops at '--' and returns an address */
+ if (err)
+ initargs_found = true;
+
if (!data) {
pr_err("'bootconfig' found on command line, but no bootconfig found\n");
return;
--
2.26.2
On Thu, Aug 06, 2020 at 08:04:07PM +0800, 姜迎 wrote:
>
>
>
> Hi all,
> This patch is used to fix checkpatch error on kernel stable rc 4.9.
> I have built pass and tested pass, thanks!
Now queued up, thanks!
greg k-h
This patch is used to fix ext4 direct I/O read error when
the read size is not aligned with block size.
Then, I will use a test to explain the error.
(1) Make a file that is not aligned with block size:
$dd if=/dev/zero of=./test.jar bs=1000 count=3
(2) I wrote a source file named "direct_io_read_file.c" as following:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
#define BUF_SIZE 1024
int main()
{
int fd;
int ret;
unsigned char *buf;
ret = posix_memalign((void **)&buf, 512, BUF_SIZE);
if (ret) {
perror("posix_memalign failed");
exit(1);
}
fd = open("./test.jar", O_RDONLY | O_DIRECT, 0755);
if (fd < 0){
perror("open ./test.jar failed");
exit(1);
}
do {
ret = read(fd, buf, BUF_SIZE);
printf("ret=%d\n",ret);
if (ret < 0) {
perror("write test.jar failed");
}
} while (ret > 0);
free(buf);
close(fd);
}
(3) Compile the source file:
$gcc direct_io_read_file.c -D_GNU_SOURCE
(4) Run the test program:
$./a.out
The result is as following:
ret=1024
ret=1024
ret=952
ret=-1
write test.jar failed: Invalid argument.
I have tested this program on XFS filesystem, XFS does not have
this problem, because XFS use iomap_dio_rw() to do direct I/O
read. And the comparing between read offset and file size is done
in iomap_dio_rw(), the code is as following:
if (pos < size) {
retval = filemap_write_and_wait_range(mapping, pos,
pos + iov_length(iov, nr_segs) - 1);
if (!retval) {
retval = mapping->a_ops->direct_IO(READ, iocb,
iov, pos, nr_segs);
}
...
}
...only when "pos < size", direct I/O can be done, or 0 will be return.
I have tested the fix patch on Ext4, it is up to the mustard of
EINVAL in man2(read) as following:
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
EINVAL
fd is attached to an object which is unsuitable for reading;
or the file was opened with the O_DIRECT flag, and either the
address specified in buf, the value specified in count, or the
current file offset is not suitably aligned.
So I think this patch can be applied to fix ext4 direct I/O error.
However Ext4 introduces direct I/O read using iomap infrastructure
on kernel 5.5, the patch is commit <b1b4705d54ab>
("ext4: introduce direct I/O read using iomap infrastructure"),
then Ext4 will be the same as XFS, they all use iomap_dio_rw() to do direct
I/O read. So this problem does not exist on kernel 5.5 for Ext4.
>From above description, we can see this problem exists on all the kernel
versions between kernel 3.14 and kernel 5.4. It will cause the Applications
to fail to read. For example, when the search service downloads a new full
index file, the search engine is loading the previous index file and is
processing the search request, it can not use buffer io that may squeeze
the previous index file in use from pagecache, so the serch service must
use direct I/O read.
Please apply this patch on these kernel versions, or please use the method
on kernel 5.5 to fix this problem.
Fixes: 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Wang Long <wanglong19(a)meituan.com>
Signed-off-by: Jiang Ying <jiangying8582(a)126.com>
Changes since V5:
Fix checkpatch error on kernel stable rc 4.9 based V3.
Use "reviewed-by" instead of "Co-developed-by" to fix
checkpatch error.
Changes since V4:
Fix build error on kernel stable rc 4.4 based V3.
This patch only for kernel 4.4.
Changes since V3:
Add the info: this bug could break some application that use the
stable kernel releases.
Changes since V2:
Optimize the description of the commit message and make a variation for
the patch, e.g. with:
Before:
loff_t size;
size = i_size_read(inode);
After:
loff_t size = i_size_read(inode);
Changes since V1:
Signed-off use real name and add "Fixes:" flag
---
fs/ext4/inode.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d8780e0..ccce89d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3575,6 +3575,11 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter)
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
ssize_t ret;
+ loff_t offset = iocb->ki_pos;
+ loff_t size = i_size_read(inode);
+
+ if (offset >= size)
+ return 0;
/*
* Shared inode_lock is enough for us - it protects against concurrent
--
1.8.3.1
This patch is used to fix ext4 direct I/O read error when
the read size is not aligned with block size.
Then, I will use a test to explain the error.
(1) Make a file that is not aligned with block size:
$dd if=/dev/zero of=./test.jar bs=1000 count=3
(2) I wrote a source file named "direct_io_read_file.c" as following:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
#define BUF_SIZE 1024
int main()
{
int fd;
int ret;
unsigned char *buf;
ret = posix_memalign((void **)&buf, 512, BUF_SIZE);
if (ret) {
perror("posix_memalign failed");
exit(1);
}
fd = open("./test.jar", O_RDONLY | O_DIRECT, 0755);
if (fd < 0){
perror("open ./test.jar failed");
exit(1);
}
do {
ret = read(fd, buf, BUF_SIZE);
printf("ret=%d\n",ret);
if (ret < 0) {
perror("write test.jar failed");
}
} while (ret > 0);
free(buf);
close(fd);
}
(3) Compile the source file:
$gcc direct_io_read_file.c -D_GNU_SOURCE
(4) Run the test program:
$./a.out
The result is as following:
ret=1024
ret=1024
ret=952
ret=-1
write test.jar failed: Invalid argument.
I have tested this program on XFS filesystem, XFS does not have
this problem, because XFS use iomap_dio_rw() to do direct I/O
read. And the comparing between read offset and file size is done
in iomap_dio_rw(), the code is as following:
if (pos < size) {
retval = filemap_write_and_wait_range(mapping, pos,
pos + iov_length(iov, nr_segs) - 1);
if (!retval) {
retval = mapping->a_ops->direct_IO(READ, iocb,
iov, pos, nr_segs);
}
...
}
...only when "pos < size", direct I/O can be done, or 0 will be return.
I have tested the fix patch on Ext4, it is up to the mustard of
EINVAL in man2(read) as following:
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
EINVAL
fd is attached to an object which is unsuitable for reading;
or the file was opened with the O_DIRECT flag, and either the
address specified in buf, the value specified in count, or the
current file offset is not suitably aligned.
So I think this patch can be applied to fix ext4 direct I/O error.
However Ext4 introduces direct I/O read using iomap infrastructure
on kernel 5.5, the patch is commit <b1b4705d54ab>
("ext4: introduce direct I/O read using iomap infrastructure"),
then Ext4 will be the same as XFS, they all use iomap_dio_rw() to do direct
I/O read. So this problem does not exist on kernel 5.5 for Ext4.
>From above description, we can see this problem exists on all the kernel
versions between kernel 3.14 and kernel 5.4. It will cause the Applications
to fail to read. For example, when the search service downloads a new full
index file, the search engine is loading the previous index file and is
processing the search request, it can not use buffer io that may squeeze
the previous index file in use from pagecache, so the serch service must
use direct I/O read.
Please apply this patch on these kernel versions, or please use the method
on kernel 5.5 to fix this problem.
Fixes: 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Wang Long <wanglong19(a)meituan.com>
Signed-off-by: Jiang Ying <jiangying8582(a)126.com>
Changes since V4:
Fix build error on kernel stable rc 4.4.
This patch only for kernel 4.4.
Changes since V3:
Add the info: this bug could break some application that use the
stable kernel releases.
Changes since V2:
Optimize the description of the commit message and make a variation for
the patch, e.g. with:
Before:
loff_t size;
size = i_size_read(inode);
After:
loff_t size = i_size_read(inode);
Changes since V1:
Signed-off use real name and add "Fixes:" flag
---
fs/ext4/inode.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8e79970..8816016 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3353,6 +3353,13 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
size_t count = iov_iter_count(iter);
ssize_t ret;
+ if (iov_iter_rw(iter) == READ) {
+ loff_t size = i_size_read(inode);
+
+ if (offset >= size)
+ return 0;
+ }
+
#ifdef CONFIG_EXT4_FS_ENCRYPTION
if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode))
return 0;
--
1.8.3.1
stable rc 4.4 build breaks on arm64, arm, x86_64 and i386.
Here are the build log failures on arm64.
git_repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
target_arch: arm64
toolchain: gcc-9
git_short_log: 0b3898baf614 (\Linux 4.4.233-rc1\)
git_sha: 0b3898baf61459e1f963dcf893b4683174668975
git_describe: v4.4.232-33-g0b3898baf614
kernel_version: 4.4.233-rc1
make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=arm64
CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=gcc CC="sccache
aarch64-linux-gnu-gcc" O=build Image
#
../arch/arm64/kernel/hw_breakpoint.c: In function ‘arch_bp_generic_fields’:
../arch/arm64/kernel/hw_breakpoint.c:348:5: note: parameter passing
for argument of type ‘struct arch_hw_breakpoint_ctrl’ changed in GCC
9.1
348 | int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
| ^~~~~~~~~~~~~~~~~~~~~~
../fs/ext4/inode.c: In function ‘ext4_direct_IO’:
../fs/ext4/inode.c:3355:9: error: ‘offset’ redeclared as different
kind of symbol
3355 | loff_t offset = iocb->ki_pos;
| ^~~~~~
../fs/ext4/inode.c:3349:17: note: previous definition of ‘offset’ was here
3349 | loff_t offset)
| ~~~~~~~^~~~~~
make[3]: *** [../scripts/Makefile.build:277: fs/ext4/inode.o] Error 1
make[3]: Target '__build' not remade because of errors.
make[2]: *** [../scripts/Makefile.build:484: fs/ext4] Error 2
../drivers/net/ethernet/apm/xgene/xgene_enet_main.c:32:36: warning:
array ‘xgene_enet_acpi_match’ assumed to have one element
32 | static const struct acpi_device_id xgene_enet_acpi_match[];
| ^~~~~~~~~~~~~~~~~~~~~
make[2]: Target '__build' not remade because of errors.
make[1]: *** [/linux/Makefile:1006: fs] Error 2
make[1]: Target 'Image' not remade because of errors.
make: *** [Makefile:152: sub-make] Error 2
make: Target 'Image' not remade because of errors.
--
Linaro LKFT
https://lkft.linaro.org
From: Mike Rapoport <rppt(a)linux.ibm.com>
When a configuration has NUMA disabled and SGI_IP27 enabled, the build
fails:
CC kernel/bounds.s
CC arch/mips/kernel/asm-offsets.s
In file included from arch/mips/include/asm/topology.h:11,
from include/linux/topology.h:36,
from include/linux/gfp.h:9,
from include/linux/slab.h:15,
from include/linux/crypto.h:19,
from include/crypto/hash.h:11,
from include/linux/uio.h:10,
from include/linux/socket.h:8,
from include/linux/compat.h:15,
from arch/mips/kernel/asm-offsets.c:12:
include/linux/topology.h: In function 'numa_node_id':
arch/mips/include/asm/mach-ip27/topology.h:16:27: error: implicit declaration of function 'cputonasid'; did you mean 'cpu_vpe_id'? [-Werror=implicit-function-declaration]
#define cpu_to_node(cpu) (cputonasid(cpu))
^~~~~~~~~~
include/linux/topology.h:119:9: note: in expansion of macro 'cpu_to_node'
return cpu_to_node(raw_smp_processor_id());
^~~~~~~~~~~
include/linux/topology.h: In function 'cpu_cpu_mask':
arch/mips/include/asm/mach-ip27/topology.h:19:7: error: implicit declaration of function 'hub_data' [-Werror=implicit-function-declaration]
&hub_data(node)->h_cpus)
^~~~~~~~
include/linux/topology.h:210:9: note: in expansion of macro 'cpumask_of_node'
return cpumask_of_node(cpu_to_node(cpu));
^~~~~~~~~~~~~~~
arch/mips/include/asm/mach-ip27/topology.h:19:21: error: invalid type argument of '->' (have 'int')
&hub_data(node)->h_cpus)
^~
include/linux/topology.h:210:9: note: in expansion of macro 'cpumask_of_node'
return cpumask_of_node(cpu_to_node(cpu));
^~~~~~~~~~~~~~~
Before switch from discontigmem to sparsemem, there always was
CONFIG_NEED_MULTIPLE_NODES=y because it was selected by DISCONTIGMEM.
Without DISCONTIGMEM it is possible to have SPARSEMEM without NUMA for
SGI_IP27 and as many things there rely on custom node definition, the
build breaks.
As Thomas noted "... there are right now too many places in IP27 code,
which assumes NUMA enabled", the simplest solution would be to always
enable NUMA for SGI-IP27 builds.
Reported-by: kernel test robot <lkp(a)intel.com>
Fixes: 397dc00e249e ("mips: sgi-ip27: switch from DISCONTIGMEM to SPARSEMEM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
---
arch/mips/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 6fee1a133e9d..a7e40bb1e5bc 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -678,6 +678,7 @@ config SGI_IP27
select SYS_SUPPORTS_NUMA
select SYS_SUPPORTS_SMP
select MIPS_L1_CACHE_SHIFT_7
+ select NUMA
help
This are the SGI Origin 200, Origin 2000 and Onyx 2 Graphics
workstations. To compile a Linux kernel that runs on these, say Y
--
2.26.2
stable rc 4.9 build breaks on arm64, arm, x86_64 and i386.
Here are the build log failures on arm64.
git_repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
target_arch: arm64
toolchain: gcc-9
git_short_log: 1f47445197d2 (\Linux 4.9.233-rc1\)
git_sha: 1f47445197d2c8eecafa2b996f635aa89851c123
git_describe: v4.9.232-51-g1f47445197d2
kernel_version: 4.9.233-rc1
make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=arm64
CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=gcc CC="sccache
aarch64-linux-gnu-gcc" O=build Image
#
../arch/arm64/kernel/hw_breakpoint.c: In function ‘arch_bp_generic_fields’:
../arch/arm64/kernel/hw_breakpoint.c:352:5: note: parameter passing
for argument of type ‘struct arch_hw_breakpoint_ctrl’ changed in GCC
9.1
352 | int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
| ^~~~~~~~~~~~~~~~~~~~~~
../fs/ext4/inode.c: In function ‘ext4_direct_IO’:
../fs/ext4/inode.c:3610:9: error: redefinition of ‘offset’
3610 | loff_t offset = iocb->ki_pos;
| ^~~~~~
../fs/ext4/inode.c:3608:9: note: previous definition of ‘offset’ was here
3608 | loff_t offset = iocb->ki_pos;
| ^~~~~~
make[3]: *** [../scripts/Makefile.build:304: fs/ext4/inode.o] Error 1
make[3]: Target '__build' not remade because of errors.
make[2]: *** [../scripts/Makefile.build:555: fs/ext4] Error 2
../lib/vsprintf.c: In function ‘number’:
../lib/vsprintf.c:399:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
399 | char *number(char *buf, char *end, unsigned long long num,
| ^~~~~~
../lib/vsprintf.c: In function ‘widen_string’:
../lib/vsprintf.c:562:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
562 | char *widen_string(char *buf, int n, char *end, struct printf_spec spec)
| ^~~~~~~~~~~~
../lib/vsprintf.c: In function ‘string’:
../lib/vsprintf.c:583:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
583 | char *string(char *buf, char *end, const char *s, struct
printf_spec spec)
| ^~~~~~
../lib/vsprintf.c: In function ‘hex_string’:
../lib/vsprintf.c:803:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
803 | char *hex_string(char *buf, char *end, u8 *addr, struct
printf_spec spec,
| ^~~~~~~~~~
../lib/vsprintf.c: In function ‘mac_address_string’:
../lib/vsprintf.c:936:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
936 | char *mac_address_string(char *buf, char *end, u8 *addr,
| ^~~~~~~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘ip4_addr_string’:
../lib/vsprintf.c:1137:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
1137 | char *ip4_addr_string(char *buf, char *end, const u8 *addr,
| ^~~~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘uuid_string’:
../lib/vsprintf.c:1305:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
1305 | char *uuid_string(char *buf, char *end, const u8 *addr,
| ^~~~~~~~~~~
../lib/vsprintf.c: In function ‘symbol_string’:
../lib/vsprintf.c:668:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
668 | char *symbol_string(char *buf, char *end, void *ptr,
| ^~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘resource_string’:
../lib/vsprintf.c:695:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
695 | char *resource_string(char *buf, char *end, struct resource *res,
| ^~~~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘ip6_addr_string’:
../lib/vsprintf.c:1123:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
1123 | char *ip6_addr_string(char *buf, char *end, const u8 *addr,
| ^~~~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘ip4_addr_string_sa’:
../lib/vsprintf.c:1210:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
1210 | char *ip4_addr_string_sa(char *buf, char *end, const struct
sockaddr_in *sa,
| ^~~~~~~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘ip6_addr_string_sa’:
../lib/vsprintf.c:1148:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
1148 | char *ip6_addr_string_sa(char *buf, char *end, const struct
sockaddr_in6 *sa,
| ^~~~~~~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘escaped_string’:
../lib/vsprintf.c:1245:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
1245 | char *escaped_string(char *buf, char *end, u8 *addr, struct
printf_spec spec,
| ^~~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘dentry_name’:
../lib/vsprintf.c:604:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
604 | char *dentry_name(char *buf, char *end, const struct dentry
*d, struct printf_spec spec,
| ^~~~~~~~~~~
../lib/vsprintf.c: In function ‘clock.isra.0’:
../lib/vsprintf.c:1387:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
1387 | char *clock(char *buf, char *end, struct clk *clk, struct
printf_spec spec,
| ^~~~~
../lib/vsprintf.c: In function ‘bitmap_list_string.isra.0’:
../lib/vsprintf.c:896:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
896 | char *bitmap_list_string(char *buf, char *end, unsigned long *bitmap,
| ^~~~~~~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘bitmap_string.isra.0’:
../lib/vsprintf.c:855:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
855 | char *bitmap_string(char *buf, char *end, unsigned long *bitmap,
| ^~~~~~~~~~~~~
../lib/vsprintf.c: In function ‘bdev_name.isra.0’:
../lib/vsprintf.c:649:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
649 | char *bdev_name(char *buf, char *end, struct block_device *bdev,
| ^~~~~~~~~
../lib/vsprintf.c: In function ‘pointer’:
../lib/vsprintf.c:1571:7: note: parameter passing for argument of type
‘struct printf_spec’ changed in GCC 9.1
1571 | char *pointer(const char *fmt, char *buf, char *end, void *ptr,
| ^~~~~~~
make[2]: Target '__build' not remade because of errors.
make[1]: *** [/linux/Makefile:1036: fs] Error 2
make[1]: Target 'Image' not remade because of errors.
make: *** [Makefile:152: sub-make] Error 2
make: Target 'Image' not remade because of errors.
--
Linaro LKFT
https://lkft.linaro.org
There are a few issues in DWC3 driver when preparing for TRB.
The driver needs to account the following:
* MPS alignment for ZLP OUT direction
* Extra TRBs when checking for available TRBs
* SG entries size > request length
Along with these fixes, there are some cleanup/refactoring patches in this
series .
Thinh Nguyen (7):
usb: dwc3: gadget: Don't setup more than requested
usb: dwc3: gadget: Fix handling ZLP
usb: dwc3: gadget: Handle ZLP for sg requests
usb: dwc3: gadget: Refactor preparing TRBs
usb: dwc3: gadget: Account for extra TRB
usb: dwc3: gadget: Rename misleading function names
usb: dwc3: ep0: Skip ZLP setup for OUT
drivers/usb/dwc3/ep0.c | 2 +-
drivers/usb/dwc3/gadget.c | 232 ++++++++++++++++++++++----------------
2 files changed, 137 insertions(+), 97 deletions(-)
base-commit: e3ee0e740c3887d2293e8d54a8707218d70d86ca
--
2.28.0
This patch is used to fix ext4 direct I/O read error when
the read size is not aligned with block size.
Then, I will use a test to explain the error.
(1) Make a file that is not aligned with block size:
$dd if=/dev/zero of=./test.jar bs=1000 count=3
(2) I wrote a source file named "direct_io_read_file.c" as following:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
#define BUF_SIZE 1024
int main()
{
int fd;
int ret;
unsigned char *buf;
ret = posix_memalign((void **)&buf, 512, BUF_SIZE);
if (ret) {
perror("posix_memalign failed");
exit(1);
}
fd = open("./test.jar", O_RDONLY | O_DIRECT, 0755);
if (fd < 0){
perror("open ./test.jar failed");
exit(1);
}
do {
ret = read(fd, buf, BUF_SIZE);
printf("ret=%d\n",ret);
if (ret < 0) {
perror("write test.jar failed");
}
} while (ret > 0);
free(buf);
close(fd);
}
(3) Compile the source file:
$gcc direct_io_read_file.c -D_GNU_SOURCE
(4) Run the test program:
$./a.out
The result is as following:
ret=1024
ret=1024
ret=952
ret=-1
write test.jar failed: Invalid argument.
I have tested this program on XFS filesystem, XFS does not have
this problem, because XFS use iomap_dio_rw() to do direct I/O
read. And the comparing between read offset and file size is done
in iomap_dio_rw(), the code is as following:
if (pos < size) {
retval = filemap_write_and_wait_range(mapping, pos,
pos + iov_length(iov, nr_segs) - 1);
if (!retval) {
retval = mapping->a_ops->direct_IO(READ, iocb,
iov, pos, nr_segs);
}
...
}
...only when "pos < size", direct I/O can be done, or 0 will be return.
I have tested the fix patch on Ext4, it is up to the mustard of
EINVAL in man2(read) as following:
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
EINVAL
fd is attached to an object which is unsuitable for reading;
or the file was opened with the O_DIRECT flag, and either the
address specified in buf, the value specified in count, or the
current file offset is not suitably aligned.
So I think this patch can be applied to fix ext4 direct I/O error.
However Ext4 introduces direct I/O read using iomap infrastructure
on kernel 5.5, the patch is commit <b1b4705d54ab>
("ext4: introduce direct I/O read using iomap infrastructure"),
then Ext4 will be the same as XFS, they all use iomap_dio_rw() to do direct
I/O read. So this problem does not exist on kernel 5.5 for Ext4.
>From above description, we can see this problem exists on all the kernel
versions between kernel 3.14 and kernel 5.4. It will cause the Applications
to fail to read. For example, when the search service downloads a new full
index file, the search engine is loading the previous index file and is
processing the search request, it can not use buffer io that may squeeze
the previous index file in use from pagecache, so the serch service must
use direct I/O read.
Please apply this patch on these kernel versions, or please use the method
on kernel 5.5 to fix this problem.
Fixes: 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
Reviewed-by: Jan Kara <jack(a)suse.cz>
Co-developed-by: Wang Long <wanglong19(a)meituan.com>
Signed-off-by: Wang Long <wanglong19(a)meituan.com>
Signed-off-by: Jiang Ying <jiangying8582(a)126.com>
Changes since V3:
Add the info: this bug could break some application that use the
stable kernel releases.
Changes since V2:
Optimize the description of the commit message and make a variation for
the patch, e.g. with:
Before:
loff_t size;
size = i_size_read(inode);
After:
loff_t size = i_size_read(inode);
Changes since V1:
Signed-off use real name and add "Fixes:" flag
---
fs/ext4/inode.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 516faa2..a66b0ac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3821,6 +3821,11 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter)
struct inode *inode = mapping->host;
size_t count = iov_iter_count(iter);
ssize_t ret;
+ loff_t offset = iocb->ki_pos;
+ loff_t size = i_size_read(inode);
+
+ if (offset >= size)
+ return 0;
/*
* Shared inode_lock is enough for us - it protects against concurrent
--
1.8.3.1
Hi
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag
fixing commit: e9df09428996 ("KVM: SVM: Add sev module_param").
The bot has tested the following trees: v5.7.11, v5.4.54, v4.19.135.
v5.7.11: Failed to apply! Possible dependencies:
3bae0459bcd5 ("KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums")
b2f432f872d9 ("KVM: x86/mmu: Tweak PSE hugepage handling to avoid 2M vs 4M conundrum")
e662ec3e0705 ("KVM: x86/mmu: Move max hugepage level to a separate #define")
v5.4.54: Failed to apply! Possible dependencies:
106ee47dc633 ("docs: kvm: Convert api.txt to ReST format")
213e0e1f500b ("KVM: SVM: Refactor logging of NPT enabled/disabled")
3bae0459bcd5 ("KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums")
3c9bd4006bfc ("KVM: x86: enable dirty log gradually in small chunks")
80b10aa92448 ("Documentation: kvm: Fix mention to number of ioctls classes")
c726200dd106 ("KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace")
cb9b88c66939 ("KVM: x86/mmu: Refactor handling of cache consistency with TDP")
da345174ceca ("KVM: arm/arm64: Allow user injection of external data aborts")
e662ec3e0705 ("KVM: x86/mmu: Move max hugepage level to a separate #define")
v4.19.135: Failed to apply! Possible dependencies:
213e0e1f500b ("KVM: SVM: Refactor logging of NPT enabled/disabled")
3bae0459bcd5 ("KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums")
44dd3ffa7bb3 ("x86/kvm/mmu: make vcpu->mmu a pointer to the current MMU")
4fef0f491347 ("KVM: x86: move definition PT_MAX_HUGEPAGE_LEVEL and KVM_NR_PAGE_SIZES together")
91e86d225ef3 ("kvm: x86: Add payload operands to kvm_multiple_exception")
c851436a34ca ("kvm: x86: Add has_payload and payload to kvm_queued_exception")
cb9b88c66939 ("KVM: x86/mmu: Refactor handling of cache consistency with TDP")
d647eb63e671 ("KVM: svm: add nrips module parameter")
da998b46d244 ("kvm: x86: Defer setting of CR2 until #PF delivery")
e662ec3e0705 ("KVM: x86/mmu: Move max hugepage level to a separate #define")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks
Sasha
Hi
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag
fixing commit: 6dc5fd93b2f1 ("ARM: 8900/1: UNWINDER_FRAME_POINTER implementation for Clang").
The bot has tested the following trees: v5.7.11, v5.4.54.
v5.7.11: Failed to apply! Possible dependencies:
5489ab50c227 ("arm/asm: add loglvl to c_backtrace()")
637ce97e7c24 ("ARM: backtrace-clang: check for NULL lr")
7c8ef99a0b04 ("ARM: backtrace-clang: add fixup for lr dereference")
v5.4.54: Failed to apply! Possible dependencies:
40ff1ddb5570 ("ARM: 8948/1: Prevent OOB access in stacktrace")
5489ab50c227 ("arm/asm: add loglvl to c_backtrace()")
637ce97e7c24 ("ARM: backtrace-clang: check for NULL lr")
7c8ef99a0b04 ("ARM: backtrace-clang: add fixup for lr dereference")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks
Sasha
This is the start of the stable review cycle for the 5.7.14 release.
There are 6 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 Aug 2020 15:34:53 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.14-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.7.14-rc1
Marc Zyngier <maz(a)kernel.org>
arm64: Workaround circular dependency in pointer_auth.h
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: move the pseudo-random 32-bit definitions to prandom.h
Linus Torvalds <torvalds(a)linux-foundation.org>
random32: remove net_rand_state from the latent entropy gcc plugin
Willy Tarreau <w(a)1wt.eu>
random: fix circular include dependency on arm64 after addition of percpu.h
Grygorii Strashko <grygorii.strashko(a)ti.com>
ARM: percpu.h: fix build error
Willy Tarreau <w(a)1wt.eu>
random32: update the net random state on interrupt and activity
-------------
Diffstat:
Makefile | 4 +-
arch/arm/include/asm/percpu.h | 2 +
arch/arm64/include/asm/pointer_auth.h | 8 +++-
drivers/char/random.c | 1 +
include/linux/prandom.h | 78 +++++++++++++++++++++++++++++++++++
include/linux/random.h | 63 ++--------------------------
kernel/time/timer.c | 8 ++++
lib/random32.c | 2 +-
8 files changed, 103 insertions(+), 63 deletions(-)
Hi,
all x86 and x86_64 images fail to boot in v4.4.y-queue (v4.4.232-30-g52247eb).
Bisect results below. Reverting both 3bc53626ab45 and d1c993b94751
fixes the problem.
Guenter
---
# bad: [52247eb98ebec43288b5da7033c5b757a6fbd1d0] Linux 4.4.233-rc1
# good: [e164d5f7b274f422f9cd4fa6a6638ea07c4969f1] Linux 4.4.232
git bisect start 'HEAD' 'v4.4.232'
# bad: [3bc53626ab45d3886eef382b83973969dc6fc429] x86, vmlinux.lds: Page-align end of ..page_aligned sections
git bisect bad 3bc53626ab45d3886eef382b83973969dc6fc429
# good: [34fda3ae46a68f53e4b18d9b5b560a9cecabb072] scsi: libsas: direct call probe and destruct
git bisect good 34fda3ae46a68f53e4b18d9b5b560a9cecabb072
# good: [994de7ca7e88d4c5e8893bd695dadf9af4751bc6] f2fs: check memory boundary by insane namelen
git bisect good 994de7ca7e88d4c5e8893bd695dadf9af4751bc6
# good: [93aa53738c81fc0e286b8cb37533e9697ae7ea6f] ARM: 8986/1: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints
git bisect good 93aa53738c81fc0e286b8cb37533e9697ae7ea6f
# bad: [d1c993b94751c4a84604711b378906ab91fb16ad] x86/build/lto: Fix truncated .bss with -fdata-sections
git bisect bad d1c993b94751c4a84604711b378906ab91fb16ad
# first bad commit: [d1c993b94751c4a84604711b378906ab91fb16ad] x86/build/lto: Fix truncated .bss with -fdata-sections
The node distance is hardcoded to 0, which causes a trouble
for some user-level applications. In particular, "libnuma"
expects the distance of a node to itself as LOCAL_DISTANCE.
This update removes the offending node distance override.
Cc: <stable(a)vger.kernel.org> # v5.6+
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Fixes: 701dc81e7412 ("s390/mm: remove fake numa support")
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
---
arch/s390/include/asm/topology.h | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
index fbb5075..3a0ac0c 100644
--- a/arch/s390/include/asm/topology.h
+++ b/arch/s390/include/asm/topology.h
@@ -86,12 +86,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
#define pcibus_to_node(bus) __pcibus_to_node(bus)
-#define node_distance(a, b) __node_distance(a, b)
-static inline int __node_distance(int a, int b)
-{
- return 0;
-}
-
#else /* !CONFIG_NUMA */
#define numa_node_id numa_node_id
--
1.8.3.1
This patch is used to fix ext4 direct I/O read error when
the read size is not aligned with block size.
Then, I will use a test to explain the error.
(1) Make a file that is not aligned with block size:
$dd if=/dev/zero of=./test.jar bs=1000 count=3
(2) I wrote a source file named "direct_io_read_file.c" as following:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
#define BUF_SIZE 1024
int main()
{
int fd;
int ret;
unsigned char *buf;
ret = posix_memalign((void **)&buf, 512, BUF_SIZE);
if (ret) {
perror("posix_memalign failed");
exit(1);
}
fd = open("./test.jar", O_RDONLY | O_DIRECT, 0755);
if (fd < 0){
perror("open ./test.jar failed");
exit(1);
}
do {
ret = read(fd, buf, BUF_SIZE);
printf("ret=%d\n",ret);
if (ret < 0) {
perror("write test.jar failed");
}
} while (ret > 0);
free(buf);
close(fd);
}
(3) Compile the source file:
$gcc direct_io_read_file.c -D_GNU_SOURCE
(4) Run the test program:
$./a.out
The result is as following:
ret=1024
ret=1024
ret=952
ret=-1
write test.jar failed: Invalid argument.
I have tested this program on XFS filesystem, XFS does not have
this problem, because XFS use iomap_dio_rw() to do direct I/O
read. And the comparing between read offset and file size is done
in iomap_dio_rw(), the code is as following:
if (pos < size) {
retval = filemap_write_and_wait_range(mapping, pos,
pos + iov_length(iov, nr_segs) - 1);
if (!retval) {
retval = mapping->a_ops->direct_IO(READ, iocb,
iov, pos, nr_segs);
}
...
}
...only when "pos < size", direct I/O can be done, or 0 will be return.
I have tested the fix patch on Ext4, it is up to the mustard of
EINVAL in man2(read) as following:
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
EINVAL
fd is attached to an object which is unsuitable for reading;
or the file was opened with the O_DIRECT flag, and either the
address specified in buf, the value specified in count, or the
current file offset is not suitably aligned.
So I think this patch can be applied to fix ext4 direct I/O error.
However Ext4 introduces direct I/O read using iomap infrastructure
on kernel 5.5, the patch is commit <b1b4705d54ab>
("ext4: introduce direct I/O read using iomap infrastructure"),
then Ext4 will be the same as XFS, they all use iomap_dio_rw() to do direct
I/O read. So this problem does not exist on kernel 5.5 for Ext4.
>From above description, we can see this problem exists on all the kernel
versions between kernel 3.14 and kernel 5.4. It will cause the Applications
to fail to read. For example, when the search service downloads a new full
index file, the search engine is loading the previous index file and is
processing the search request, it can not use buffer io that may squeeze
the previous index file in use from pagecache, so the serch service must
use direct I/O read.
Please apply this patch on these kernel versions, or please use the method
on kernel 5.5 to fix this problem.
Fixes: 9fe55eea7e4b ("Fix race when checking i_size on direct i/o read")
Reviewed-by: Jan Kara <jack(a)suse.cz>
Co-developed-by: Wang Long <wanglong19(a)meituan.com>
Signed-off-by: Wang Long <wanglong19(a)meituan.com>
Signed-off-by: Jiang Ying <jiangying8582(a)126.com>
---
fs/ext4/inode.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 516faa2..a66b0ac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3821,6 +3821,11 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter)
struct inode *inode = mapping->host;
size_t count = iov_iter_count(iter);
ssize_t ret;
+ loff_t offset = iocb->ki_pos;
+ loff_t size = i_size_read(inode);
+
+ if (offset >= size)
+ return 0;
/*
* Shared inode_lock is enough for us - it protects against concurrent
--
1.8.3.1
commit df58fae72428b "smb3: Incorrect size for netname negotiate
context" (patch was added in 5.4) turns out to be more important than
we realized (fixing a feature added in 5.3 by commit 96d3cca1241d6
which sends the "netname context" during protocol negotiations).
commit df58fae72428b should be cc:stable for 5.3
--
Thanks,
Steve
This patch set is aim to update the old IP_TOS_MASK to new IP_DSCP_MASK
as tos value has been obsoleted for a long time. But to make sure we don't
break any existing behaviour, we can't just replease all IP_TOS_MASK
to new IP_DSCP_MASK.
So let's update it case by case. The first issue we will fix is that vxlan
is unable to take the first 3 bits from DSCP field before xmit. Use the
new RT_DSCP() would resolve this.
v2: Remove IP_DSCP() definition as it's duplicated with RT_DSCP().
Post the patch to net instead of net-next as we need fix the vxlan issue
Hangbin Liu (2):
net: add IP_DSCP_MASK
vxlan: fix getting tos value from DSCP field
drivers/net/vxlan.c | 4 ++--
include/uapi/linux/in_route.h | 1 +
include/uapi/linux/ip.h | 1 +
3 files changed, 4 insertions(+), 2 deletions(-)
--
2.25.4
On Tue, Aug 4, 2020 at 5:52 PM Marc Plumb <lkml.mplumb(a)gmail.com> wrote:
>
> TL;DR This change takes the seed data from get_random_bytes and broadcasts it to the network, thereby destroying the security of dev/random. This change needs to be reverted and redesigned.
This was discussed.,
It's theoretical, not practical.
The patch improves real security, and the fake "but in theory" kind is
meaningless and people should stop that kind of behavior.
Linus
Willy and Ted,
This commit has serious security flaws
f227e3ec3b5cad859ad15666874405e8c1bbc1d4
TL;DR This change takes the seed data from get_random_bytes and
broadcasts it to the network, thereby destroying the security of
dev/random. This change needs to be reverted and redesigned.
It is inefficient:
This function is called from an interrupt context, so there is no chance
of a CPU switch, therefore the this_cpu_add function should be
__this_cpu_add. This is a sign that the patch may have been rushed and
may not be suitable for a stable release.
It is fixing the wrong problem:
The net_rand_state PRNG is a weak PRNG for the purpose of avoiding
collisions, not to be unguessable to an attacker. The network PRNG does
not need secure seeding. If you need a secure PRNG then you shouldn't be
using the net_rand_state PRNG. Please reconsider why you think that this
change is necessary.
It dramatically weakens dev/random:
Seeding two PRNGs with the same entropy causes two problems. The minor
one is that you're double counting entropy. The major one is that anyone
who can determine the state of one PRNG can determine the state of the
other.
The net_rand_state PRNG is effectively a 113 bit LFSR, so anyone who can
see any 113 bits of output can determine the complete internal state.
The output of the net_rand_state PRNG is used to determine how data is
sent to the network, so the output is effectively broadcast to anyone
watching network traffic. Therefore anyone watching the network traffic
can determine the seed data being fed to the net_rand_state PRNG. Since
this is the same seed data being fed to get_random_bytes, it allows an
attacker to determine the state and there output of /dev/random. I
sincerely hope that this was not the intended goal. :)
Thank you
Marc
From: Muchun Song <songmuchun(a)bytedance.com>
We found a case of kernel panic on our server. The stack trace is as
follows(omit some irrelevant information):
BUG: kernel NULL pointer dereference, address: 0000000000000080
RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0
RSP: 0018:ffffb512c6550998 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000
RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0
RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
ftrace_ops_assist_func+0x56/0xe0
ftrace_call+0x5/0x34
tcpa_statistic_send+0x5/0x130 [ttcp_engine]
The tcpa_statistic_send is the function being kprobed. After analysis,
the root cause is that the fourth parameter regs of kprobe_ftrace_handler
is NULL. Why regs is NULL? We use the crash tool to analyze the kdump.
crash> dis tcpa_statistic_send -r
<tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller>
The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller.
So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler
is NULL. In theory, we should call the ftrace_regs_caller instead of the
ftrace_caller. After in-depth analysis, we found a reproducible path.
Writing a simple kernel module which starts a periodic timer. The
timer's handler is named 'kprobe_test_timer_handler'. The module
name is kprobe_test.ko.
1) insmod kprobe_test.ko
2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
3) echo 0 > /proc/sys/kernel/ftrace_enabled
4) rmmod kprobe_test
5) stop step 2) kprobe
6) insmod kprobe_test.ko
7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
We mark the kprobe as GONE but not disarm the kprobe in the step 4).
The step 5) also do not disarm the kprobe when unregister kprobe. So
we do not remove the ip from the filter. In this case, when the module
loads again in the step 6), we will replace the code to ftrace_caller
via the ftrace_module_enable(). When we register kprobe again, we will
not replace ftrace_caller to ftrace_regs_caller because the ftrace is
disabled in the step 3). So the step 7) will trigger kernel panic. Fix
this problem by disarming the kprobe when the module is going away.
Link: https://lkml.kernel.org/r/20200728064536.24405-1-songmuchun@bytedance.com
Cc: stable(a)vger.kernel.org
Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
Acked-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Co-developed-by: Chengming Zhou <zhouchengming(a)bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming(a)bytedance.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/kprobes.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 4a904cc56d68..07bf03fcf574 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2113,6 +2113,13 @@ static void kill_kprobe(struct kprobe *p)
* the original probed function (which will be freed soon) any more.
*/
arch_remove_kprobe(p);
+
+ /*
+ * The module is going away. We should disarm the kprobe which
+ * is using ftrace.
+ */
+ if (kprobe_ftrace(p))
+ disarm_kprobe_ftrace(p);
}
/* Disable one kprobe */
--
2.26.2
From: Nick Desaulniers <ndesaulniers(a)google.com>
__tracepoint_string's have their string data stored in .rodata, and an
address to that data stored in the "__tracepoint_str" section. Functions
that refer to those strings refer to the symbol of the address. Compiler
optimization can replace those address references with references
directly to the string data. If the address doesn't appear to have other
uses, then it appears dead to the compiler and is removed. This can
break the /tracing/printk_formats sysfs node which iterates the
addresses stored in the "__tracepoint_str" section.
Like other strings stored in custom sections in this header, mark these
__used to inform the compiler that there are other non-obvious users of
the address, so they should still be emitted.
Link: https://lkml.kernel.org/r/20200730224555.2142154-2-ndesaulniers@google.com
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: 102c9323c35a8 ("tracing: Add __tracepoint_string() to export string pointers")
Reported-by: Tim Murray <timmurray(a)google.com>
Reported-by: Simon MacMullen <simonmacm(a)google.com>
Suggested-by: Greg Hackmann <ghackmann(a)google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
include/linux/tracepoint.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index a1fecf311621..3a5b717d92e8 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -361,7 +361,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
static const char *___tp_str __tracepoint_string = str; \
___tp_str; \
})
-#define __tracepoint_string __attribute__((section("__tracepoint_str")))
+#define __tracepoint_string __attribute__((section("__tracepoint_str"), used))
#else
/*
* tracepoint_string() is used to save the string address for userspace
--
2.26.2
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 3ff3d4f43856 - x86/i8259: Use printk_deferred() to prevent deadlock
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 2:
✅ Boot test
✅ ACPI table test
✅ ACPI enabled test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking firewall: basic netfilter test
🚧 ✅ audit: audit testsuite test
🚧 ✅ trace: ftrace/tracer
🚧 ✅ kdump - kexec_boot
ppc64le:
Host 1:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking firewall: basic netfilter test
🚧 ✅ audit: audit testsuite test
🚧 ✅ trace: ftrace/tracer
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
🚧 ⚡⚡⚡ kdump - sysrq-c
Host 3:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ trace: ftrace/tracer
Host 2:
✅ Boot test
✅ selinux-policy: serge-testsuite
✅ stress: stress-ng
🚧 ❌ Storage blktests
Host 3:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking firewall: basic netfilter test
🚧 ✅ audit: audit testsuite test
🚧 ✅ trace: ftrace/tracer
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ✅ CPU: Frequency Driver Test
🚧 ✅ CPU: Idle Test
🚧 ✅ xfstests - btrfs
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 2:
✅ Boot test
🚧 ✅ kdump - sysrq-c
🚧 ✅ kdump - file-load
Host 3:
✅ Boot test
✅ ACPI table test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: sanity smoke test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking firewall: basic netfilter test
🚧 ✅ audit: audit testsuite test
🚧 ✅ trace: ftrace/tracer
🚧 ✅ kdump - kexec_boot
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
The patch titled
Subject: hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem
has been added to the -mm tree. Its filename is
hugetlbfs-remove-call-to-huge_pte_alloc-without-i_mmap_rwsem.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/hugetlbfs-remove-call-to-huge_pte_…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-remove-call-to-huge_pte_…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem
Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem
in at least read mode. This is because the explicit locking in
huge_pmd_share (called by huge_pte_alloc) was removed. When restructuring
the code, the call to huge_pte_alloc in the else block at the beginning of
hugetlb_fault was missed.
Unfortunately, that else clause is exercised when there is no page table
entry. This will likely lead to a call to huge_pmd_share. If
huge_pmd_share thinks pmd sharing is possible, it will traverse the
mapping tree (i_mmap) without holding i_mmap_rwsem. If someone else is
modifying the tree, bad things such as addressing exceptions or worse
could happen.
Simply remove the else clause. It should have been removed previously.
The code following the else will call huge_pte_alloc with the appropriate
locking.
To prevent this type of issue in the future, add routines to assert that
i_mmap_rwsem is held, and call these routines in huge pmd sharing
routines.
Link: http://lkml.kernel.org/r/e670f327-5cf9-1959-96e4-6dc7cc30d3d5@oracle.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Suggested-by: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Kirill A.Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/fs.h | 10 ++++++++++
include/linux/hugetlb.h | 8 +++++---
mm/hugetlb.c | 15 +++++++--------
mm/rmap.c | 2 +-
4 files changed, 23 insertions(+), 12 deletions(-)
--- a/include/linux/fs.h~hugetlbfs-remove-call-to-huge_pte_alloc-without-i_mmap_rwsem
+++ a/include/linux/fs.h
@@ -518,6 +518,16 @@ static inline void i_mmap_unlock_read(st
up_read(&mapping->i_mmap_rwsem);
}
+static inline void i_mmap_assert_locked(struct address_space *mapping)
+{
+ lockdep_assert_held(&mapping->i_mmap_rwsem);
+}
+
+static inline void i_mmap_assert_write_locked(struct address_space *mapping)
+{
+ lockdep_assert_held_write(&mapping->i_mmap_rwsem);
+}
+
/*
* Might pages of this file be mapped into userspace?
*/
--- a/include/linux/hugetlb.h~hugetlbfs-remove-call-to-huge_pte_alloc-without-i_mmap_rwsem
+++ a/include/linux/hugetlb.h
@@ -164,7 +164,8 @@ pte_t *huge_pte_alloc(struct mm_struct *
unsigned long addr, unsigned long sz);
pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz);
-int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep);
+int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long *addr, pte_t *ptep);
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end);
struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
@@ -203,8 +204,9 @@ static inline struct address_space *huge
return NULL;
}
-static inline int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr,
- pte_t *ptep)
+static inline int huge_pmd_unshare(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long *addr, pte_t *ptep)
{
return 0;
}
--- a/mm/hugetlb.c~hugetlbfs-remove-call-to-huge_pte_alloc-without-i_mmap_rwsem
+++ a/mm/hugetlb.c
@@ -3953,7 +3953,7 @@ void __unmap_hugepage_range(struct mmu_g
continue;
ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, &address, ptep)) {
+ if (huge_pmd_unshare(mm, vma, &address, ptep)) {
spin_unlock(ptl);
/*
* We just unmapped a page of PMDs by clearing a PUD.
@@ -4540,10 +4540,6 @@ vm_fault_t hugetlb_fault(struct mm_struc
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
VM_FAULT_SET_HINDEX(hstate_index(h));
- } else {
- ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
- if (!ptep)
- return VM_FAULT_OOM;
}
/*
@@ -5020,7 +5016,7 @@ unsigned long hugetlb_change_protection(
if (!ptep)
continue;
ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, &address, ptep)) {
+ if (huge_pmd_unshare(mm, vma, &address, ptep)) {
pages++;
spin_unlock(ptl);
shared_pmd = true;
@@ -5401,12 +5397,14 @@ out:
* returns: 1 successfully unmapped a shared pte page
* 0 the underlying pte page is not shared, or it is the last user
*/
-int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
+int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long *addr, pte_t *ptep)
{
pgd_t *pgd = pgd_offset(mm, *addr);
p4d_t *p4d = p4d_offset(pgd, *addr);
pud_t *pud = pud_offset(p4d, *addr);
+ i_mmap_assert_write_locked(vma->vm_file->f_mapping);
BUG_ON(page_count(virt_to_page(ptep)) == 0);
if (page_count(virt_to_page(ptep)) == 1)
return 0;
@@ -5424,7 +5422,8 @@ pte_t *huge_pmd_share(struct mm_struct *
return NULL;
}
-int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
+int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long *addr, pte_t *ptep)
{
return 0;
}
--- a/mm/rmap.c~hugetlbfs-remove-call-to-huge_pte_alloc-without-i_mmap_rwsem
+++ a/mm/rmap.c
@@ -1469,7 +1469,7 @@ static bool try_to_unmap_one(struct page
* do this outside rmap routines.
*/
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
- if (huge_pmd_unshare(mm, &address, pvmw.pte)) {
+ if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
/*
* huge_pmd_unshare unmapped an entire PMD
* page. There is no way of knowing exactly
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlbfs-prevent-filesystem-stacking-of-hugetlbfs.patch
hugetlbfs-remove-call-to-huge_pte_alloc-without-i_mmap_rwsem.patch
cma-dont-quit-at-first-error-when-activating-reserved-areas.patch
Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem
in at least read mode. This is because the explicit locking in
huge_pmd_share (called by huge_pte_alloc) was removed. When restructuring
the code, the call to huge_pte_alloc in the else block at the beginning
of hugetlb_fault was missed.
Unfortunately, that else clause is exercised when there is no page table
entry. This will likely lead to a call to huge_pmd_share. If
huge_pmd_share thinks pmd sharing is possible, it will traverse the mapping
tree (i_mmap) without holding i_mmap_rwsem. If someone else is modifying
the tree, bad things such as addressing exceptions or worse could happen.
Simply remove the else clause. It should have been removed previously.
The code following the else will call huge_pte_alloc with the appropriate
locking.
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
mm/hugetlb.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 590111ea6975..0f6716422a53 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4539,10 +4539,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
VM_FAULT_SET_HINDEX(hstate_index(h));
- } else {
- ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
- if (!ptep)
- return VM_FAULT_OOM;
}
/*
--
2.25.4
The patch titled
Subject: khugepaged: khugepaged_test_exit() check mmget_still_valid()
has been added to the -mm tree. Its filename is
khugepaged-khugepaged_test_exit-check-mmget_still_valid.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/khugepaged-khugepaged_test_exit-ch…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/khugepaged-khugepaged_test_exit-ch…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: khugepaged_test_exit() check mmget_still_valid()
Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself. collapse_huge_page() is used for anon THP
only, and earned its mmget_still_valid() check because it inserts a huge
pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry. But core dumping without mmap
lock must have been as open to mistaking a racily cleared pmd entry for a
page table at physical page 0, as exit_mmap() was. And we certainly have
no interest in mapping as a THP once dumping core.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils
Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
--- a/mm/khugepaged.c~khugepaged-khugepaged_test_exit-check-mmget_still_valid
+++ a/mm/khugepaged.c
@@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(stru
static inline int khugepaged_test_exit(struct mm_struct *mm)
{
- return atomic_read(&mm->mm_users) == 0;
+ return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm);
}
static bool hugepage_vma_check(struct vm_area_struct *vma,
@@ -1100,9 +1100,6 @@ static void collapse_huge_page(struct mm
* handled by the anon_vma lock + PG_lock.
*/
mmap_write_lock(mm);
- result = SCAN_ANY_PROCESS;
- if (!mmget_still_valid(mm))
- goto out;
result = hugepage_vma_revalidate(mm, address, &vma);
if (result)
goto out;
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-memcontrol-decouple-reference-counting-from-page-accounting-fix.patch
khugepaged-collapse_pte_mapped_thp-flush-the-right-range.patch
khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock.patch
khugepaged-retract_page_tables-remember-to-test-exit.patch
khugepaged-khugepaged_test_exit-check-mmget_still_valid.patch
The patch titled
Subject: khugepaged: retract_page_tables() remember to test exit
has been added to the -mm tree. Its filename is
khugepaged-retract_page_tables-remember-to-test-exit.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/khugepaged-retract_page_tables-rem…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/khugepaged-retract_page_tables-rem…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
--- a/mm/khugepaged.c~khugepaged-retract_page_tables-remember-to-test-exit
+++ a/mm/khugepaged.c
@@ -1532,6 +1532,7 @@ out:
static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
{
struct vm_area_struct *vma;
+ struct mm_struct *mm;
unsigned long addr;
pmd_t *pmd, _pmd;
@@ -1560,7 +1561,8 @@ static void retract_page_tables(struct a
continue;
if (vma->vm_end < addr + HPAGE_PMD_SIZE)
continue;
- pmd = mm_find_pmd(vma->vm_mm, addr);
+ mm = vma->vm_mm;
+ pmd = mm_find_pmd(mm, addr);
if (!pmd)
continue;
/*
@@ -1570,17 +1572,19 @@ static void retract_page_tables(struct a
* mmap_lock while holding page lock. Fault path does it in
* reverse order. Trylock is a way to avoid deadlock.
*/
- if (mmap_write_trylock(vma->vm_mm)) {
- spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
- /* assume page table is clear */
- _pmd = pmdp_collapse_flush(vma, addr, pmd);
- spin_unlock(ptl);
- mmap_write_unlock(vma->vm_mm);
- mm_dec_nr_ptes(vma->vm_mm);
- pte_free(vma->vm_mm, pmd_pgtable(_pmd));
+ if (mmap_write_trylock(mm)) {
+ if (!khugepaged_test_exit(mm)) {
+ spinlock_t *ptl = pmd_lock(mm, pmd);
+ /* assume page table is clear */
+ _pmd = pmdp_collapse_flush(vma, addr, pmd);
+ spin_unlock(ptl);
+ mm_dec_nr_ptes(mm);
+ pte_free(mm, pmd_pgtable(_pmd));
+ }
+ mmap_write_unlock(mm);
} else {
/* Try again later */
- khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+ khugepaged_add_pte_mapped_thp(mm, addr);
}
}
i_mmap_unlock_write(mapping);
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-memcontrol-decouple-reference-counting-from-page-accounting-fix.patch
khugepaged-collapse_pte_mapped_thp-flush-the-right-range.patch
khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock.patch
khugepaged-retract_page_tables-remember-to-test-exit.patch
khugepaged-khugepaged_test_exit-check-mmget_still_valid.patch
The patch titled
Subject: khugepaged: collapse_pte_mapped_thp() protect the pmd lock
has been added to the -mm tree. Its filename is
khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/khugepaged-collapse_pte_mapped_thp…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/khugepaged-collapse_pte_mapped_thp…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 44 +++++++++++++++++++-------------------------
1 file changed, 19 insertions(+), 25 deletions(-)
--- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock
+++ a/mm/khugepaged.c
@@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s
{
unsigned long haddr = addr & HPAGE_PMD_MASK;
struct vm_area_struct *vma = find_vma(mm, haddr);
- struct page *hpage = NULL;
+ struct page *hpage;
pte_t *start_pte, *pte;
pmd_t *pmd, _pmd;
spinlock_t *ptl;
@@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s
if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
return;
+ hpage = find_lock_page(vma->vm_file->f_mapping,
+ linear_page_index(vma, haddr));
+ if (!hpage)
+ return;
+
+ if (!PageHead(hpage))
+ goto drop_hpage;
+
pmd = mm_find_pmd(mm, haddr);
if (!pmd)
- return;
+ goto drop_hpage;
start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
@@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s
page = vm_normal_page(vma, addr, *pte);
- if (!page || !PageCompound(page))
- goto abort;
-
- if (!hpage) {
- hpage = compound_head(page);
- /*
- * The mapping of the THP should not change.
- *
- * Note that uprobe, debugger, or MAP_PRIVATE may
- * change the page table, but the new page will
- * not pass PageCompound() check.
- */
- if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping))
- goto abort;
- }
-
/*
- * Confirm the page maps to the correct subpage.
- *
- * Note that uprobe, debugger, or MAP_PRIVATE may change
- * the page table, but the new page will not pass
- * PageCompound() check.
+ * Note that uprobe, debugger, or MAP_PRIVATE may change the
+ * page table, but the new page will not be a subpage of hpage.
*/
- if (WARN_ON(hpage + i != page))
+ if (hpage + i != page)
goto abort;
count++;
}
@@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s
pte_unmap_unlock(start_pte, ptl);
/* step 3: set proper refcount and mm_counters. */
- if (hpage) {
+ if (count) {
page_ref_sub(hpage, count);
add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count);
}
@@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s
spin_unlock(ptl);
mm_dec_nr_ptes(mm);
pte_free(mm, pmd_pgtable(_pmd));
+
+drop_hpage:
+ unlock_page(hpage);
+ put_page(hpage);
return;
abort:
pte_unmap_unlock(start_pte, ptl);
+ goto drop_hpage;
}
static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-memcontrol-decouple-reference-counting-from-page-accounting-fix.patch
khugepaged-collapse_pte_mapped_thp-flush-the-right-range.patch
khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock.patch
khugepaged-retract_page_tables-remember-to-test-exit.patch
khugepaged-khugepaged_test_exit-check-mmget_still_valid.patch
The patch titled
Subject: khugepaged: collapse_pte_mapped_thp() flush the right range
has been added to the -mm tree. Its filename is
khugepaged-collapse_pte_mapped_thp-flush-the-right-range.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/khugepaged-collapse_pte_mapped_thp…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/khugepaged-collapse_pte_mapped_thp…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: collapse_pte_mapped_thp() flush the right range
pmdp_collapse_flush() should be given the start address at which the huge
page is mapped, haddr: it was given addr, which at that point has been
used as a local variable, incremented to the end address of the extent.
Found by source inspection while chasing a hugepage locking bug, which I
then could not explain by this. At first I thought this was very bad;
then saw that all of the page translations that were not flushed would
actually still point to the right pages afterwards, so harmless; then
realized that I know nothing of how different architectures and models
cache intermediate paging structures, so maybe it matters after all -
particularly since the page table concerned is immediately freed.
Much easier to fix than to think about.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-flush-the-right-range
+++ a/mm/khugepaged.c
@@ -1502,7 +1502,7 @@ void collapse_pte_mapped_thp(struct mm_s
/* step 4: collapse pmd */
ptl = pmd_lock(vma->vm_mm, pmd);
- _pmd = pmdp_collapse_flush(vma, addr, pmd);
+ _pmd = pmdp_collapse_flush(vma, haddr, pmd);
spin_unlock(ptl);
mm_dec_nr_ptes(mm);
pte_free(mm, pmd_pgtable(_pmd));
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-memcontrol-decouple-reference-counting-from-page-accounting-fix.patch
khugepaged-collapse_pte_mapped_thp-flush-the-right-range.patch
khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock.patch
khugepaged-retract_page_tables-remember-to-test-exit.patch
khugepaged-khugepaged_test_exit-check-mmget_still_valid.patch
This reverts commit 9a6418487b56 ("ALSA: hda: call runtime_allow()
for all hda controllers").
The reverted patch already introduced some regressions on some
machines:
- on gemini-lake machines, the error of "azx_get_response timeout"
happens in the hda driver.
- on the machines with alc662 codec, the audio jack detection doesn't
work anymore.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208511
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
sound/pci/hda/hda_intel.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index e699873c8293..e34a4d5d047c 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2352,7 +2352,6 @@ static int azx_probe_continue(struct azx *chip)
if (azx_has_pm_runtime(chip)) {
pm_runtime_use_autosuspend(&pci->dev);
- pm_runtime_allow(&pci->dev);
pm_runtime_put_autosuspend(&pci->dev);
}
--
2.17.1