The patch titled
Subject: mm: page_alloc: fix allocation imbalances from speculative cache lookup
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-memcg-accounting-leak-in-speculative-cache-lookup.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: page_alloc: fix allocation imbalances from speculative cache lookup
When the freeing of a higher-order page block (non-compound) races with a
speculative page cache lookup, __free_pages() needs to leave the first
order-0 page in the chunk to the lookup but free the buddy pages that the
lookup doesn't know about separately.
There are currently two problems with it:
1. It checks PageHead() to see whether we're dealing with a compound
page after put_page_testzero(). But the speculative lookup could have
freed the page after our put and cleared PageHead, in which case we
would double free the tail pages.
To fix this, test PageHead before the put and cache the result for
afterwards.
2. If such a higher-order page is charged to a memcg (e.g. !vmap
kernel stack)), only the first page of the block has page->memcg set.
That means we'll uncharge only one order-0 page from the entire block,
and leak the remainder.
To fix this, add a split_page_memcg() before it starts freeing tail
pages, to ensure they all have page->memcg set up.
While at it, also update the comments a bit to clarify what exactly is
happening to the page during that race.
Link: https://lkml.kernel.org/r/20210319071547.60973-1-hannes@cmpxchg.org
Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Hugh Dickins <hughd(a)google.com>
Reported-by: Matthew Wilcox <willy(a)infradead.org>
Acked-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Zhou Guanghui <zhouguanghui1(a)huawei.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org> # 5.10+
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 33 +++++++++++++++++++++++++++------
1 file changed, 27 insertions(+), 6 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-memcg-accounting-leak-in-speculative-cache-lookup
+++ a/mm/page_alloc.c
@@ -5072,10 +5072,9 @@ static inline void free_the_page(struct
* the allocation, so it is easy to leak memory. Freeing more memory
* than was allocated will probably emit a warning.
*
- * If the last reference to this page is speculative, it will be released
- * by put_page() which only frees the first page of a non-compound
- * allocation. To prevent the remaining pages from being leaked, we free
- * the subsequent pages here. If you want to use the page's reference
+ * This function isn't a put_page(). Don't let the put_page_testzero()
+ * fool you, it's only to deal with speculative cache references. It
+ * WILL free pages directly. If you want to use the page's reference
* count to decide when to free the allocation, you should allocate a
* compound page, and use put_page() instead of __free_pages().
*
@@ -5084,11 +5083,33 @@ static inline void free_the_page(struct
*/
void __free_pages(struct page *page, unsigned int order)
{
- if (put_page_testzero(page))
+ /*
+ * Drop the base reference from __alloc_pages and free. In
+ * case there is an outstanding speculative reference, from
+ * e.g. the page cache, it will put and free the page later.
+ */
+ if (likely(put_page_testzero(page))) {
free_the_page(page, order);
- else if (!PageHead(page))
+ return;
+ }
+
+ /*
+ * The speculative reference will put and free the page.
+ *
+ * However, if the speculation was into a higher-order page
+ * chunk that isn't marked compound, the other side will know
+ * nothing about our buddy pages and only free the order-0
+ * page at the start of our chunk! We must split off and free
+ * the buddy pages here.
+ *
+ * The buddy pages aren't individually refcounted, so they
+ * can't have any pending speculative references themselves.
+ */
+ if (!PageHead(page) && order > 0) {
+ split_page_memcg(page, 1 << order);
while (order-- > 0)
free_the_page(page + (1 << order), order);
+ }
}
EXPORT_SYMBOL(__free_pages);
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
mm-page_alloc-fix-allocation-imbalances-from-speculative-cache-lookup.patch
mm-page-writeback-simplify-memcg-handling-in-test_clear_page_writeback.patch
mm-memcontrol-fix-cpuhotplug-statistics-flushing.patch
mm-memcontrol-kill-mem_cgroup_nodeinfo.patch
mm-memcontrol-privatize-memcg_page_state-query-functions.patch
cgroup-rstat-support-cgroup1.patch
cgroup-rstat-punt-root-level-optimization-to-individual-controllers.patch
mm-memcontrol-switch-to-rstat.patch
mm-memcontrol-switch-to-rstat-fix-2.patch
mm-memcontrol-consolidate-lruvec-stat-flushing.patch
kselftests-cgroup-update-kmem-test-for-new-vmstat-implementation.patch
The patch titled
Subject: mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
has been removed from the -mm tree. Its filename was
mm-highmem-fix-config_debug_kmap_local_force_map.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ira Weiny <ira.weiny(a)intel.com>
Subject: mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
The kernel test robot found that __kmap_local_sched_out() was not
correctly skipping the guard pages when CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
was set.[1] This was due to CONFIG_DEBUG_HIGHMEM check being used.
Change the configuration check to be correct.
[1] https://lore.kernel.org/lkml/20210304083825.GB17830@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20210318230657.1497881-1-ira.weiny@intel.com
Fixes: 0e91a0c6984c ("mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP")
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Oliver Sang <oliver.sang(a)intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni(a)wdc.com>
Cc: David Sterba <dsterba(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/highmem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/highmem.c~mm-highmem-fix-config_debug_kmap_local_force_map
+++ a/mm/highmem.c
@@ -618,7 +618,7 @@ void __kmap_local_sched_out(void)
int idx;
/* With debug all even slots are unmapped and act as guard */
- if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !(i & 0x01)) {
+ if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) {
WARN_ON_ONCE(!pte_none(pteval));
continue;
}
@@ -654,7 +654,7 @@ void __kmap_local_sched_in(void)
int idx;
/* With debug all even slots are unmapped and act as guard */
- if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !(i & 0x01)) {
+ if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) {
WARN_ON_ONCE(!pte_none(pteval));
continue;
}
_
Patches currently in -mm which might be from ira.weiny(a)intel.com are
iov_iter-lift-memzero_page-to-highmemh.patch
btrfs-use-memzero_page-instead-of-open-coded-kmap-pattern.patch
mm-highmem-remove-deprecated-kmap_atomic.patch
The patch titled
Subject: squashfs: fix inode lookup sanity checks
has been removed from the -mm tree. Its filename was
squashfs-fix-inode-lookup-sanity-checks.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Sean Nyekjaer <sean(a)geanix.com>
Subject: squashfs: fix inode lookup sanity checks
When mouting a squashfs image created without inode compression it fails
with: "unable to read inode lookup table"
It turns out that the BLOCK_OFFSET is missing when checking the
SQUASHFS_METADATA_SIZE agaist the actual size.
Link: https://lkml.kernel.org/r/20210226092903.1473545-1-sean@geanix.com
Fixes: eabac19e40c0 ("squashfs: add more sanity checks in inode lookup")
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
Acked-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/export.c | 8 ++++++--
fs/squashfs/squashfs_fs.h | 1 +
2 files changed, 7 insertions(+), 2 deletions(-)
--- a/fs/squashfs/export.c~squashfs-fix-inode-lookup-sanity-checks
+++ a/fs/squashfs/export.c
@@ -152,14 +152,18 @@ __le64 *squashfs_read_inode_lookup_table
start = le64_to_cpu(table[n]);
end = le64_to_cpu(table[n + 1]);
- if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+ if (start >= end
+ || (end - start) >
+ (SQUASHFS_METADATA_SIZE + SQUASHFS_BLOCK_OFFSET)) {
kfree(table);
return ERR_PTR(-EINVAL);
}
}
start = le64_to_cpu(table[indexes - 1]);
- if (start >= lookup_table_start || (lookup_table_start - start) > SQUASHFS_METADATA_SIZE) {
+ if (start >= lookup_table_start ||
+ (lookup_table_start - start) >
+ (SQUASHFS_METADATA_SIZE + SQUASHFS_BLOCK_OFFSET)) {
kfree(table);
return ERR_PTR(-EINVAL);
}
--- a/fs/squashfs/squashfs_fs.h~squashfs-fix-inode-lookup-sanity-checks
+++ a/fs/squashfs/squashfs_fs.h
@@ -17,6 +17,7 @@
/* size of metadata (inode and directory) blocks */
#define SQUASHFS_METADATA_SIZE 8192
+#define SQUASHFS_BLOCK_OFFSET 2
/* default size of block device I/O */
#ifdef CONFIG_SQUASHFS_4K_DEVBLK_SIZE
_
Patches currently in -mm which might be from sean(a)geanix.com are
The patch titled
Subject: z3fold: prevent reclaim/free race for headless pages
has been removed from the -mm tree. Its filename was
z3fold-prevent-reclaim-free-race-for-headless-pages.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Thomas Hebb <tommyhebb(a)gmail.com>
Subject: z3fold: prevent reclaim/free race for headless pages
commit ca0246bb97c2 ("z3fold: fix possible reclaim races") introduced the
PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page release."
By atomically testing and setting the bit in each of z3fold_free() and
z3fold_reclaim_page(), a double-free was avoided.
However, commit dcf5aedb24f8 ("z3fold: stricter locking and more careful
reclaim") appears to have unintentionally broken this behavior by moving
the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock
gets taken, which only happens for non-headless pages. For headless
pages, the check is now skipped entirely and races can occur again.
I have observed such a race on my system:
page:00000000ffbd76b7 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x165316
flags: 0x2ffff0000000000()
raw: 02ffff0000000000 ffffea0004535f48 ffff8881d553a170 0000000000000000
raw: 0000000000000000 0000000000000011 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:707!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 2 PID: 291928 Comm: kworker/2:0 Tainted: G B 5.10.7-arch1-1-kasan #1
Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016
Workqueue: zswap-shrink shrink_worker
RIP: 0010:__free_pages+0x10a/0x130
Code: c1 e7 06 48 01 ef 45 85 e4 74 d1 44 89 e6 31 d2 41 83 ec 01 e8 e7 b0 ff ff eb da 48 c7 c6 e0 32 91 88 48 89 ef e8 a6 89 f8 ff <0f> 0b 4c 89 e7 e8 fc 79 07 00 e9 33 ff ff ff 48 89 ef e8 ff 79 07
RSP: 0000:ffff88819a2ffb98 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffffea000594c5a8 RCX: 0000000000000000
RDX: 1ffffd4000b298b7 RSI: 0000000000000000 RDI: ffffea000594c5b8
RBP: ffffea000594c580 R08: 000000000000003e R09: ffff8881d5520bbb
R10: ffffed103aaa4177 R11: 0000000000000001 R12: ffffea000594c5b4
R13: 0000000000000000 R14: ffff888165316000 R15: ffffea000594c588
FS: 0000000000000000(0000) GS:ffff8881d5500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7c8c3654d8 CR3: 0000000103f42004 CR4: 00000000001706e0
Call Trace:
z3fold_zpool_shrink+0x9b6/0x1240
? sugov_update_single+0x357/0x990
? sched_clock+0x5/0x10
? sched_clock_cpu+0x18/0x180
? z3fold_zpool_map+0x490/0x490
? _raw_spin_lock_irq+0x88/0xe0
shrink_worker+0x35/0x90
process_one_work+0x70c/0x1210
? pwq_dec_nr_in_flight+0x15b/0x2a0
worker_thread+0x539/0x1200
? __kthread_parkme+0x73/0x120
? rescuer_thread+0x1000/0x1000
kthread+0x330/0x400
? __kthread_bind_mask+0x90/0x90
ret_from_fork+0x22/0x30
Modules linked in: rfcomm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ccm algif_aead des_generic libdes ecb algif_skcipher cmac bnep md4 algif_hash af_alg vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm hid_logitech_hidpp kvm at24 mac80211 snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic intel_pmc_bxt snd_hda_codec_hdmi ledtrig_audio iTCO_vendor_support mei_wdt mei_hdcp snd_hda_intel snd_intel_dspcfg libarc4 soundwire_intel irqbypass iwlwifi soundwire_generic_allocation rapl soundwire_cadence intel_cstate snd_hda_codec intel_uncore btusb joydev mousedev snd_usb_audio pcspkr btrtl uvcvideo nouveau btbcm i2c_i801 btintel snd_hda_core videobuf2_vmalloc i2c_smbus snd_usbmidi_lib videobuf2_memops bluetooth snd_hwdep soundwire_bus snd_soc_rt5640 videobuf2_v4l2 cfg80211 snd_soc_rl6231 videobuf2_common snd_rawmidi lpc_ich alx videodev mdio snd_seq_device snd_soc_core mc ecdh_generic mxm_wmi mei_me
hid_logitech_dj wmi snd_compress e1000e ac97_bus mei ttm rfkill snd_pcm_dmaengine ecc snd_pcm snd_timer snd soundcore mac_hid acpi_pad pkcs8_key_parser it87 hwmon_vid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas i915 video intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart
---[ end trace 126d646fc3dc0ad8 ]---
To fix the issue, re-add the earlier test and set in the case where we
have a headless page.
Link: https://lkml.kernel.org/r/c8106dbe6d8390b290cd1d7f873a2942e805349e.16154520…
Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Thomas Hebb <tommyhebb(a)gmail.com>
Reviewed-by: Vitaly Wool <vitaly.wool(a)konsulko.com>
Cc: Jongseok Kim <ks77sj(a)gmail.com>
Cc: Snild Dolkow <snild(a)sony.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
--- a/mm/z3fold.c~z3fold-prevent-reclaim-free-race-for-headless-pages
+++ a/mm/z3fold.c
@@ -1346,8 +1346,22 @@ static int z3fold_reclaim_page(struct z3
page = list_entry(pos, struct page, lru);
zhdr = page_address(page);
- if (test_bit(PAGE_HEADLESS, &page->private))
+ if (test_bit(PAGE_HEADLESS, &page->private)) {
+ /*
+ * For non-headless pages, we wait to do this
+ * until we have the page lock to avoid racing
+ * with __z3fold_alloc(). Headless pages don't
+ * have a lock (and __z3fold_alloc() will never
+ * see them), but we still need to test and set
+ * PAGE_CLAIMED to avoid racing with
+ * z3fold_free(), so just do it now before
+ * leaving the loop.
+ */
+ if (test_and_set_bit(PAGE_CLAIMED, &page->private))
+ continue;
+
break;
+ }
if (kref_get_unless_zero(&zhdr->refcount) == 0) {
zhdr = NULL;
_
Patches currently in -mm which might be from tommyhebb(a)gmail.com are