This is the start of the stable review cycle for the 5.15.156 release.
There are 45 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.156-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.156-rc1
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/i915/cdclk: Fix CDCLK programming order when pipes are active
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/bugs: Fix BHI handling of RRSBA
Ingo Molnar <mingo(a)kernel.org>
x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/bugs: Fix BHI documentation
Daniel Sneddon <daniel.sneddon(a)linux.intel.com>
x86/bugs: Fix return type of spectre_bhi_state()
Arnd Bergmann <arnd(a)arndb.de>
irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
Adam Dunlap <acdunlap(a)google.com>
x86/apic: Force native_apic_mem_read() to use the MOV instruction
John Stultz <jstultz(a)google.com>
selftests: timers: Fix abs() warning in posix_timers test
Sean Christopherson <seanjc(a)google.com>
x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
Namhyung Kim <namhyung(a)kernel.org>
perf/x86: Fix out of range data
Gavin Shan <gshan(a)redhat.com>
vhost: Add smp_rmb() in vhost_vq_avail_empty()
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/client: Fully protect modes[] with dev->mode_config.mutex
Boris Burkov <boris(a)bur.io>
btrfs: qgroup: correctly model root qgroup rsv in convert
Jacob Pan <jacob.jun.pan(a)linux.intel.com>
iommu/vt-d: Allocate local memory for page request queue
Arnd Bergmann <arnd(a)arndb.de>
tracing: hide unused ftrace_event_id_fops
David Arinzon <darinzon(a)amazon.com>
net: ena: Fix incorrect descriptor free behavior
David Arinzon <darinzon(a)amazon.com>
net: ena: Wrong missing IO completions check order
David Arinzon <darinzon(a)amazon.com>
net: ena: Fix potential sign extension issue
Michal Luczaj <mhal(a)rbox.co>
af_unix: Fix garbage collector racing against connect()
Kuniyuki Iwashima <kuniyu(a)amazon.com>
af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
Arınç ÜNAL <arinc.unal(a)arinc9.com>
net: dsa: mt7530: trap link-local frames regardless of ST Port State
Daniel Machon <daniel.machon(a)microchip.com>
net: sparx5: fix wrong config being used when reconfiguring PCS
Cosmin Ratiu <cratiu(a)nvidia.com>
net/mlx5: Properly link new fs rules into the tree
Eric Dumazet <edumazet(a)google.com>
netfilter: complete validation of user input
Jiri Benc <jbenc(a)redhat.com>
ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
Arnd Bergmann <arnd(a)arndb.de>
ipv4/route: avoid unused-but-set-variable warning
Arnd Bergmann <arnd(a)arndb.de>
ipv6: fib: hide unused 'pn' variable
Geetha sowjanya <gakula(a)marvell.com>
octeontx2-af: Fix NIX SQ mode and BP config
Kuniyuki Iwashima <kuniyu(a)amazon.com>
af_unix: Clear stale u->oob_skb.
Eric Dumazet <edumazet(a)google.com>
geneve: fix header validation in geneve[6]_xmit_skb
Eric Dumazet <edumazet(a)google.com>
xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
u64_stats: Disable preemption on 32bit UP+SMP PREEMPT_RT during updates.
Ilya Maximets <i.maximets(a)ovn.org>
net: openvswitch: fix unwanted error log on timeout policy probing
Dan Carpenter <dan.carpenter(a)linaro.org>
scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
Arnd Bergmann <arnd(a)arndb.de>
nouveau: fix function cast warning
Alex Constantino <dreaming.about.electric.sheep(a)gmail.com>
Revert "drm/qxl: simplify qxl_fence_wait"
Frank Li <Frank.Li(a)nxp.com>
arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
Nini Song <nini.song(a)mediatek.com>
media: cec: core: remove length check of Timer Status
Dmitry Antipov <dmantipov(a)yandex.ru>
Bluetooth: Fix memory leak in hci_req_sync_complete()
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Only update pages_touched when a new page is touched
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Avoid infinite loop trying to resize local TT
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 22 +-
Documentation/admin-guide/kernel-parameters.txt | 12 +-
Makefile | 4 +-
arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 12 +-
arch/x86/Kconfig | 21 +-
arch/x86/events/core.c | 1 +
arch/x86/include/asm/apic.h | 3 +-
arch/x86/kernel/cpu/bugs.c | 82 ++++----
arch/x86/kernel/cpu/common.c | 48 ++---
drivers/gpu/drm/drm_client_modeset.c | 3 +-
drivers/gpu/drm/i915/display/intel_cdclk.c | 7 +-
drivers/gpu/drm/i915/display/intel_cdclk.h | 3 +
.../gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c | 7 +-
drivers/gpu/drm/qxl/qxl_release.c | 50 ++++-
drivers/iommu/intel/svm.c | 2 +-
drivers/media/cec/core/cec-adap.c | 14 --
drivers/net/dsa/mt7530.c | 229 ++++++++++++++++++---
drivers/net/dsa/mt7530.h | 5 +
drivers/net/ethernet/amazon/ena/ena_com.c | 2 +-
drivers/net/ethernet/amazon/ena/ena_netdev.c | 35 ++--
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 22 +-
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 +-
.../net/ethernet/microchip/sparx5/sparx5_port.c | 4 +-
drivers/net/geneve.c | 4 +-
drivers/scsi/qla2xxx/qla_edif.c | 2 +-
drivers/vhost/vhost.c | 14 +-
fs/btrfs/qgroup.c | 2 +
include/linux/dma-fence.h | 7 +
include/linux/irqflags.h | 2 +-
include/linux/u64_stats_sync.h | 42 ++--
include/net/addrconf.h | 4 +
include/net/af_unix.h | 2 +-
include/net/ip_tunnels.h | 33 +++
kernel/cpu.c | 3 +-
kernel/trace/ring_buffer.c | 6 +-
kernel/trace/trace_events.c | 4 +
net/batman-adv/translation-table.c | 2 +-
net/bluetooth/hci_request.c | 4 +-
net/ipv4/netfilter/arp_tables.c | 4 +
net/ipv4/netfilter/ip_tables.c | 4 +
net/ipv4/route.c | 4 +-
net/ipv6/addrconf.c | 7 +-
net/ipv6/ip6_fib.c | 7 +-
net/ipv6/netfilter/ip6_tables.c | 4 +
net/openvswitch/conntrack.c | 5 +-
net/unix/af_unix.c | 8 +-
net/unix/garbage.c | 35 +++-
net/unix/scm.c | 8 +-
net/xdp/xsk.c | 2 +
tools/testing/selftests/timers/posix_timers.c | 2 +-
50 files changed, 556 insertions(+), 256 deletions(-)
Dear Reviewers,
we suggest to backport a commit to Linux kernel-5.10 stable tree to fix thermal bug. Thanks a lot
source patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/d…
thermal: core: Fix thermal zone suspend-resume synchronization
There are 3 synchronization issues with thermal zone suspend-resume
during system-wide transitions:
1. The resume code runs in a PM notifier which is invoked after user
space has been thawed, so it can run concurrently with user space
which can trigger a thermal zone device removal. If that happens,
the thermal zone resume code may use a stale pointer to the next
list element and crash, because it does not hold thermal_list_lock
while walking thermal_tz_list.
2. The thermal zone resume code calls thermal_zone_device_init()
outside the zone lock, so user space or an update triggered by
the platform firmware may see an inconsistent state of a
thermal zone leading to unexpected behavior.
3. Clearing the in_suspend global variable in thermal_pm_notify()
allows __thermal_zone_device_update() to continue for all thermal
zones and it may as well run before the thermal_tz_list walk (or
at any point during the list walk for that matter) and attempt to
operate on a thermal zone that has not been resumed yet. It may
also race destructively with thermal_zone_device_init().
To address these issues, add thermal_list_lock locking to
thermal_pm_notify(), especially arount the thermal_tz_list,
make it call thermal_zone_device_init() back-to-back with
__thermal_zone_device_update() under the zone lock and replace
in_suspend with per-zone bool "suspend" indicators set and unset
under the given zone's lock.
Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/
Reported-by: Bo Ye <bo.ye(a)mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.co thermal: core: Fix thermal zone suspend-resume synchronization
There are 3 synchronization issues with thermal zone suspend-resume
during system-wide transitions:
1. The resume code runs in a PM notifier which is invoked after user
space has been thawed, so it can run concurrently with user space
which can trigger a thermal zone device removal. If that happens,
the thermal zone resume code may use a stale pointer to the next
list element and crash, because it does not hold thermal_list_lock
while walking thermal_tz_list.
2. The thermal zone resume code calls thermal_zone_device_init()
outside the zone lock, so user space or an update triggered by
the platform firmware may see an inconsistent state of a
thermal zone leading to unexpected behavior.
3. Clearing the in_suspend global variable in thermal_pm_notify()
allows __thermal_zone_device_update() to continue for all thermal
zones and it may as well run before the thermal_tz_list walk (or
at any point during the list walk for that matter) and attempt to
operate on a thermal zone that has not been resumed yet. It may
also race destructively with thermal_zone_device_init().
To address these issues, add thermal_list_lock locking to
thermal_pm_notify(), especially arount the thermal_tz_list,
make it call thermal_zone_device_init() back-to-back with
__thermal_zone_device_update() under the zone lock and replace
in_suspend with per-zone bool "suspend" indicators set and unset
under the given zone's lock.
Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/
Reported-by: Bo Ye <bo.ye(a)mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
BRs
Bo Ye
The quilt patch titled
Subject: mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly
has been removed from the -mm tree. Its filename was
mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly
Date: Thu, 14 Mar 2024 17:12:59 +0100
Darrick reports that in some cases where pread() would fail with -EIO and
mmap()+access would generate a SIGBUS signal, MADV_POPULATE_READ /
MADV_POPULATE_WRITE will keep retrying forever and not fail with -EFAULT.
While the madvise() call can be interrupted by a signal, this is not the
desired behavior. MADV_POPULATE_READ / MADV_POPULATE_WRITE should behave
like page faults in that case: fail and not retry forever.
A reproducer can be found at [1].
The reason is that __get_user_pages(), as called by
faultin_vma_page_range(), will not handle VM_FAULT_RETRY in a proper way:
it will simply return 0 when VM_FAULT_RETRY happened, making
madvise_populate()->faultin_vma_page_range() retry again and again, never
setting FOLL_TRIED->FAULT_FLAG_TRIED for __get_user_pages().
__get_user_pages_locked() does what we want, but duplicating that logic in
faultin_vma_page_range() feels wrong.
So let's use __get_user_pages_locked() instead, that will detect
VM_FAULT_RETRY and set FOLL_TRIED when retrying, making the fault handler
return VM_FAULT_SIGBUS (VM_FAULT_ERROR) at some point, propagating -EFAULT
from faultin_page() to __get_user_pages(), all the way to
madvise_populate().
But, there is an issue: __get_user_pages_locked() will end up re-taking
the MM lock and then __get_user_pages() will do another VMA lookup. In
the meantime, the VMA layout could have changed and we'd fail with
different error codes than we'd want to.
As __get_user_pages() will currently do a new VMA lookup either way, let
it do the VMA handling in a different way, controlled by a new
FOLL_MADV_POPULATE flag, effectively moving these checks from
madvise_populate() + faultin_page_range() in there.
With this change, Darricks reproducer properly fails with -EFAULT, as
documented for MADV_POPULATE_READ / MADV_POPULATE_WRITE.
[1] https://lore.kernel.org/all/20240313171936.GN1927156@frogsfrogsfrogs/
Link: https://lkml.kernel.org/r/20240314161300.382526-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240314161300.382526-2-david@redhat.com
Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Darrick J. Wong <djwong(a)kernel.org>
Closes: https://lore.kernel.org/all/20240311223815.GW1927156@frogsfrogsfrogs/
Cc: Darrick J. Wong <djwong(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 54 ++++++++++++++++++++++++++++--------------------
mm/internal.h | 10 +++++---
mm/madvise.c | 17 +--------------
3 files changed, 40 insertions(+), 41 deletions(-)
--- a/mm/gup.c~mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly
+++ a/mm/gup.c
@@ -1206,6 +1206,22 @@ static long __get_user_pages(struct mm_s
/* first iteration or cross vma bound */
if (!vma || start >= vma->vm_end) {
+ /*
+ * MADV_POPULATE_(READ|WRITE) wants to handle VMA
+ * lookups+error reporting differently.
+ */
+ if (gup_flags & FOLL_MADV_POPULATE) {
+ vma = vma_lookup(mm, start);
+ if (!vma) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ if (check_vma_flags(vma, gup_flags)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ goto retry;
+ }
vma = gup_vma_lookup(mm, start);
if (!vma && in_gate_area(mm, start)) {
ret = get_gate_page(mm, start & PAGE_MASK,
@@ -1685,35 +1701,35 @@ long populate_vma_page_range(struct vm_a
}
/*
- * faultin_vma_page_range() - populate (prefault) page tables inside the
- * given VMA range readable/writable
+ * faultin_page_range() - populate (prefault) page tables inside the
+ * given range readable/writable
*
* This takes care of mlocking the pages, too, if VM_LOCKED is set.
*
- * @vma: target vma
+ * @mm: the mm to populate page tables in
* @start: start address
* @end: end address
* @write: whether to prefault readable or writable
* @locked: whether the mmap_lock is still held
*
- * Returns either number of processed pages in the vma, or a negative error
- * code on error (see __get_user_pages()).
+ * Returns either number of processed pages in the MM, or a negative error
+ * code on error (see __get_user_pages()). Note that this function reports
+ * errors related to VMAs, such as incompatible mappings, as expected by
+ * MADV_POPULATE_(READ|WRITE).
*
- * vma->vm_mm->mmap_lock must be held. The range must be page-aligned and
- * covered by the VMA. If it's released, *@locked will be set to 0.
+ * The range must be page-aligned.
+ *
+ * mm->mmap_lock must be held. If it's released, *@locked will be set to 0.
*/
-long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, bool write, int *locked)
+long faultin_page_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, bool write, int *locked)
{
- struct mm_struct *mm = vma->vm_mm;
unsigned long nr_pages = (end - start) / PAGE_SIZE;
int gup_flags;
long ret;
VM_BUG_ON(!PAGE_ALIGNED(start));
VM_BUG_ON(!PAGE_ALIGNED(end));
- VM_BUG_ON_VMA(start < vma->vm_start, vma);
- VM_BUG_ON_VMA(end > vma->vm_end, vma);
mmap_assert_locked(mm);
/*
@@ -1725,19 +1741,13 @@ long faultin_vma_page_range(struct vm_ar
* a poisoned page.
* !FOLL_FORCE: Require proper access permissions.
*/
- gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE;
+ gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE |
+ FOLL_MADV_POPULATE;
if (write)
gup_flags |= FOLL_WRITE;
- /*
- * We want to report -EINVAL instead of -EFAULT for any permission
- * problems or incompatible mappings.
- */
- if (check_vma_flags(vma, gup_flags))
- return -EINVAL;
-
- ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, locked);
+ ret = __get_user_pages_locked(mm, start, nr_pages, NULL, locked,
+ gup_flags);
lru_add_drain();
return ret;
}
--- a/mm/internal.h~mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly
+++ a/mm/internal.h
@@ -686,9 +686,8 @@ struct anon_vma *folio_anon_vma(struct f
void unmap_mapping_folio(struct folio *folio);
extern long populate_vma_page_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end, int *locked);
-extern long faultin_vma_page_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- bool write, int *locked);
+extern long faultin_page_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, bool write, int *locked);
extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags,
unsigned long bytes);
@@ -1127,10 +1126,13 @@ enum {
FOLL_FAST_ONLY = 1 << 20,
/* allow unlocking the mmap lock */
FOLL_UNLOCKABLE = 1 << 21,
+ /* VMA lookup+checks compatible with MADV_POPULATE_(READ|WRITE) */
+ FOLL_MADV_POPULATE = 1 << 22,
};
#define INTERNAL_GUP_FLAGS (FOLL_TOUCH | FOLL_TRIED | FOLL_REMOTE | FOLL_PIN | \
- FOLL_FAST_ONLY | FOLL_UNLOCKABLE)
+ FOLL_FAST_ONLY | FOLL_UNLOCKABLE | \
+ FOLL_MADV_POPULATE)
/*
* Indicates for which pages that are write-protected in the page table,
--- a/mm/madvise.c~mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly
+++ a/mm/madvise.c
@@ -908,27 +908,14 @@ static long madvise_populate(struct vm_a
{
const bool write = behavior == MADV_POPULATE_WRITE;
struct mm_struct *mm = vma->vm_mm;
- unsigned long tmp_end;
int locked = 1;
long pages;
*prev = vma;
while (start < end) {
- /*
- * We might have temporarily dropped the lock. For example,
- * our VMA might have been split.
- */
- if (!vma || start >= vma->vm_end) {
- vma = vma_lookup(mm, start);
- if (!vma)
- return -ENOMEM;
- }
-
- tmp_end = min_t(unsigned long, end, vma->vm_end);
/* Populate (prefault) page tables readable/writable. */
- pages = faultin_vma_page_range(vma, start, tmp_end, write,
- &locked);
+ pages = faultin_page_range(mm, start, end, write, &locked);
if (!locked) {
mmap_read_lock(mm);
locked = 1;
@@ -949,7 +936,7 @@ static long madvise_populate(struct vm_a
pr_warn_once("%s: unhandled return value: %ld\n",
__func__, pages);
fallthrough;
- case -ENOMEM:
+ case -ENOMEM: /* No VMA or out of memory. */
return -ENOMEM;
}
}
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-madvise-dont-perform-madvise-vma-walk-for-madv_populate_readwrite.patch
mm-convert-folio_estimated_sharers-to-folio_likely_mapped_shared.patch
mm-convert-folio_estimated_sharers-to-folio_likely_mapped_shared-fix.patch
selftests-memfd_secret-add-vmsplice-test.patch
mm-merge-folio_is_secretmem-and-folio_fast_pin_allowed-into-gup_fast_folio_allowed.patch
mm-optimize-config_per_vma_lock-member-placement-in-vm_area_struct.patch
mm-remove-prot-parameter-from-move_pte.patch
mm-gup-consistently-name-gup-fast-functions.patch
mm-treewide-rename-config_have_fast_gup-to-config_have_gup_fast.patch
mm-use-gup-fast-instead-fast-gup-in-remaining-comments.patch
drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.patch
mm-pass-vma-instead-of-mm-to-follow_pte.patch
mm-follow_pte-improvements.patch
mm-allow-for-detecting-underflows-with-page_mapcount-again.patch
mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.patch
mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.patch
mm-track-mapcount-of-large-folios-in-single-value.patch
mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.patch
mm-make-folio_mapcount-return-0-for-small-typed-folios.patch
mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.patch
mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.patch
mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.patch
mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.patch
mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.patch
sh-mm-cache-use-folio_mapped-in-copy_from_user_page.patch
mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.patch
mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.patch
trace-events-page_ref-trace-the-raw-page-mapcount-value.patch
xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.patch
mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.patch
documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.patch
mm-ksm-rename-get_ksm_page_flags-to-ksm_get_folio_flags.patch
mm-ksm-remove-page_mapcount-usage-in-stable_tree_search.patch
loongarch-tlb-fix-error-parameter-ptep-set-but-not-used-due-to-__tlb_remove_tlb_entry.patch
The quilt patch titled
Subject: nilfs2: fix OOB in nilfs_set_de_type
has been removed from the -mm tree. Its filename was
nilfs2-fix-oob-in-nilfs_set_de_type.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jeongjun Park <aha310510(a)gmail.com>
Subject: nilfs2: fix OOB in nilfs_set_de_type
Date: Tue, 16 Apr 2024 03:20:48 +0900
The size of the nilfs_type_by_mode array in the fs/nilfs2/dir.c file is
defined as "S_IFMT >> S_SHIFT", but the nilfs_set_de_type() function,
which uses this array, specifies the index to read from the array in the
same way as "(mode & S_IFMT) >> S_SHIFT".
static void nilfs_set_de_type(struct nilfs_dir_entry *de, struct inode
*inode)
{
umode_t mode = inode->i_mode;
de->file_type = nilfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; // oob
}
However, when the index is determined this way, an out-of-bounds (OOB)
error occurs by referring to an index that is 1 larger than the array size
when the condition "mode & S_IFMT == S_IFMT" is satisfied. Therefore, a
patch to resize the nilfs_type_by_mode array should be applied to prevent
OOB errors.
Link: https://lkml.kernel.org/r/20240415182048.7144-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+2e22057de05b9f3b30d8(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2e22057de05b9f3b30d8
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nilfs2/dir.c~nilfs2-fix-oob-in-nilfs_set_de_type
+++ a/fs/nilfs2/dir.c
@@ -240,7 +240,7 @@ nilfs_filetype_table[NILFS_FT_MAX] = {
#define S_SHIFT 12
static unsigned char
-nilfs_type_by_mode[S_IFMT >> S_SHIFT] = {
+nilfs_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = {
[S_IFREG >> S_SHIFT] = NILFS_FT_REG_FILE,
[S_IFDIR >> S_SHIFT] = NILFS_FT_DIR,
[S_IFCHR >> S_SHIFT] = NILFS_FT_CHRDEV,
_
Patches currently in -mm which might be from aha310510(a)gmail.com are
The quilt patch titled
Subject: fork: defer linking file vma until vma is fully initialized
has been removed from the -mm tree. Its filename was
fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: fork: defer linking file vma until vma is fully initialized
Date: Wed, 10 Apr 2024 17:14:41 +0800
Thorvald reported a WARNING [1]. And the root cause is below race:
CPU 1 CPU 2
fork hugetlbfs_fallocate
dup_mmap hugetlbfs_punch_hole
i_mmap_lock_write(mapping);
vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
i_mmap_lock_write(mapping);
hugetlb_vmdelete_list
vma_interval_tree_foreach
hugetlb_vma_trylock_write -- Vma_lock is cleared.
tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
by deferring linking file vma until vma is fully initialized. Those vmas
should be initialized first before they can be used.
Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com
Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reported-by: Thorvald Natvig <thorvald(a)google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Reviewed-by: Jane Chu <jane.chu(a)oracle.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
--- a/kernel/fork.c~fork-defer-linking-file-vma-until-vma-is-fully-initialized
+++ a/kernel/fork.c
@@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(str
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
+ /*
+ * Copy/update hugetlb private vma information.
+ */
+ if (is_vm_hugetlb_page(tmp))
+ hugetlb_dup_vma_private(tmp);
+
+ /*
+ * Link the vma into the MT. After using __mt_dup(), memory
+ * allocation is not necessary here, so it cannot fail.
+ */
+ vma_iter_bulk_store(&vmi, tmp);
+
+ mm->map_count++;
+
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
+
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
@@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(str
i_mmap_unlock_write(mapping);
}
- /*
- * Copy/update hugetlb private vma information.
- */
- if (is_vm_hugetlb_page(tmp))
- hugetlb_dup_vma_private(tmp);
-
- /*
- * Link the vma into the MT. After using __mt_dup(), memory
- * allocation is not necessary here, so it cannot fail.
- */
- vma_iter_bulk_store(&vmi, tmp);
-
- mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
retval = copy_page_range(tmp, mpnt);
- if (tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
if (retval) {
mpnt = vma_next(&vmi);
goto loop_out;
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
The quilt patch titled
Subject: Squashfs: check the inode number is not the invalid value of zero
has been removed from the -mm tree. Its filename was
squashfs-check-the-inode-number-is-not-the-invalid-value-of-zero.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: Squashfs: check the inode number is not the invalid value of zero
Date: Mon, 8 Apr 2024 23:02:06 +0100
Syskiller has produced an out of bounds access in fill_meta_index().
That out of bounds access is ultimately caused because the inode
has an inode number with the invalid value of zero, which was not checked.
The reason this causes the out of bounds access is due to following
sequence of events:
1. Fill_meta_index() is called to allocate (via empty_meta_index())
and fill a metadata index. It however suffers a data read error
and aborts, invalidating the newly returned empty metadata index.
It does this by setting the inode number of the index to zero,
which means unused (zero is not a valid inode number).
2. When fill_meta_index() is subsequently called again on another
read operation, locate_meta_index() returns the previous index
because it matches the inode number of 0. Because this index
has been returned it is expected to have been filled, and because
it hasn't been, an out of bounds access is performed.
This patch adds a sanity check which checks that the inode number
is not zero when the inode is created and returns -EINVAL if it is.
[phillip(a)squashfs.org.uk: whitespace fix]
Link: https://lkml.kernel.org/r/20240409204723.446925-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20240408220206.435788-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: "Ubisectech Sirius" <bugreport(a)ubisectech.com>
Closes: https://lore.kernel.org/lkml/87f5c007-b8a5-41ae-8b57-431e924c5915.bugreport…
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/inode.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/fs/squashfs/inode.c~squashfs-check-the-inode-number-is-not-the-invalid-value-of-zero
+++ a/fs/squashfs/inode.c
@@ -48,6 +48,10 @@ static int squashfs_new_inode(struct sup
gid_t i_gid;
int err;
+ inode->i_ino = le32_to_cpu(sqsh_ino->inode_number);
+ if (inode->i_ino == 0)
+ return -EINVAL;
+
err = squashfs_get_id(sb, le16_to_cpu(sqsh_ino->uid), &i_uid);
if (err)
return err;
@@ -58,7 +62,6 @@ static int squashfs_new_inode(struct sup
i_uid_write(inode, i_uid);
i_gid_write(inode, i_gid);
- inode->i_ino = le32_to_cpu(sqsh_ino->inode_number);
inode_set_mtime(inode, le32_to_cpu(sqsh_ino->mtime), 0);
inode_set_atime(inode, inode_get_mtime_sec(inode), 0);
inode_set_ctime(inode, inode_get_mtime_sec(inode), 0);
_
Patches currently in -mm which might be from phillip(a)squashfs.org.uk are
squashfs-remove-deprecated-strncpy-by-not-copying-the-string.patch