From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may
fail, in which case -ENOTBLK is returned and this error code is
propagated back to user space trhough iomap_dio_rw() ->
zonefs_file_dio_write() return chain. This error code is fairly obscure
and may confuse the user. Avoid this and be consistent with the behavior
of zonefs_file_dio_append() for similar invalidate_inode_pages2_range()
errors by returning -EBUSY to user space when iomap_dio_rw() returns
-ENOTBLK.
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Hans Holmberg <hans.holmberg(a)wdc.com>
(cherry picked from commit 77af13ba3c7f91d91c377c7e2d122849bbc17128)
Orabug: 35351356
Signed-off-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Conflicts:
fs/zonefs/file.c
small conflicts due to old api for iomap_dio_rw()
---
fs/zonefs/file.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 49b934c98c5a..27cd84c5e5d0 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -583,11 +583,20 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
append = sync;
}
- if (append)
+ if (append) {
ret = zonefs_file_dio_append(iocb, from);
- else
+ } else {
+ /*
+ * iomap_dio_rw() may return ENOTBLK if there was an issue with
+ * page invalidation. Overwrite that error code with EBUSY to
+ * be consistent with zonefs_file_dio_append() return value for
+ * similar issues.
+ */
ret = iomap_dio_rw(iocb, from, &zonefs_write_iomap_ops,
&zonefs_write_dio_ops, 0, 0);
+ if (ret == -ENOTBLK)
+ ret = -EBUSY;
+ }
if (zonefs_zone_is_seq(z) &&
(ret > 0 || ret == -EIOCBQUEUED)) {
--
2.31.1
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
When a direct append write is executed, the append offset may correspond
to the last page of a sequential file inode which might have been cached
already by buffered reads, page faults with mmap-read or non-direct
readahead. To ensure that the on-disk and cached data is consistant for
such last cached page, make sure to always invalidate it in
zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to
userspace to differentiate from IO errors.
This invalidation will always be a no-op when the FS block size (device
zone write granularity) is equal to the page size (e.g. 4K).
Reported-by: Hans Holmberg <Hans.Holmberg(a)wdc.com>
Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Hans Holmberg <hans.holmberg(a)wdc.com>
(cherry picked from commit c1976bd8f23016d8706973908f2bb0ac0d852a8f)
Orabug: 35351356
Signed-off-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
---
fs/zonefs/file.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 5708c54cda69..49b934c98c5a 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -382,6 +382,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
struct zonefs_zone *z = zonefs_inode_zone(inode);
struct block_device *bdev = inode->i_sb->s_bdev;
unsigned int max = bdev_max_zone_append_sectors(bdev);
+ pgoff_t start, end;
struct bio *bio;
ssize_t size = 0;
int nr_pages;
@@ -390,6 +391,19 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
iov_iter_truncate(from, max);
+ /*
+ * If the inode block size (zone write granularity) is smaller than the
+ * page size, we may be appending data belonging to the last page of the
+ * inode straddling inode->i_size, with that page already cached due to
+ * a buffered read or readahead. So make sure to invalidate that page.
+ * This will always be a no-op for the case where the block size is
+ * equal to the page size.
+ */
+ start = iocb->ki_pos >> PAGE_SHIFT;
+ end = (iocb->ki_pos + iov_iter_count(from) - 1) >> PAGE_SHIFT;
+ if (invalidate_inode_pages2_range(inode->i_mapping, start, end))
+ return -EBUSY;
+
nr_pages = iov_iter_npages(from, BIO_MAX_VECS);
if (!nr_pages)
return 0;
--
2.31.1
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.
Fixes: a608da3bd730 ("zonefs: Detect append writes at invalid locations")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
(cherry picked from commit 88b170088ad2c3e27086fe35769aa49f8a512564)
Orabug: 35351356
Signed-off-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
---
fs/zonefs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 82f608d57a84..5708c54cda69 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -428,7 +428,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
if (bio->bi_iter.bi_sector != wpsector) {
zonefs_warn(inode->i_sb,
"Corrupted write pointer %llu for zone at %llu\n",
- wpsector, z->z_sector);
+ bio->bi_iter.bi_sector, z->z_sector);
ret = -EIO;
}
}
--
2.31.1
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
If a file zone transitions to the offline or readonly state from an
active state, we must clear the zone active flag and decrement the
active seq file counter. Do so in zonefs_account_active() using the new
zonefs inode flags ZONEFS_ZONE_OFFLINE and ZONEFS_ZONE_READONLY. These
flags are set if necessary in zonefs_check_zone_condition() based on the
result of report zones operation after an IO error.
Fixes: 87c9ce3ffec9 ("zonefs: Add active seq file accounting")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
(cherry picked from commit db58653ce0c7cf4d155727852607106f890005c0)
Orabug: 35351356
Signed-off-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
---
fs/zonefs/super.c | 11 +++++++++++
fs/zonefs/zonefs.h | 6 ++++--
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 355f2da12374..0552c103fcd8 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -40,6 +40,13 @@ static void zonefs_account_active(struct inode *inode)
if (zi->i_ztype != ZONEFS_ZTYPE_SEQ)
return;
+ /*
+ * For zones that transitioned to the offline or readonly condition,
+ * we only need to clear the active state.
+ */
+ if (zi->i_flags & (ZONEFS_ZONE_OFFLINE | ZONEFS_ZONE_READONLY))
+ goto out;
+
/*
* If the zone is active, that is, if it is explicitly open or
* partially written, check if it was already accounted as active.
@@ -53,6 +60,7 @@ static void zonefs_account_active(struct inode *inode)
return;
}
+out:
/* The zone is not active. If it was, update the active count */
if (zi->i_flags & ZONEFS_ZONE_ACTIVE) {
zi->i_flags &= ~ZONEFS_ZONE_ACTIVE;
@@ -324,6 +332,7 @@ static loff_t zonefs_check_zone_condition(struct inode *inode,
inode->i_flags |= S_IMMUTABLE;
inode->i_mode &= ~0777;
zone->wp = zone->start;
+ zi->i_flags |= ZONEFS_ZONE_OFFLINE;
return 0;
case BLK_ZONE_COND_READONLY:
/*
@@ -342,8 +351,10 @@ static loff_t zonefs_check_zone_condition(struct inode *inode,
zone->cond = BLK_ZONE_COND_OFFLINE;
inode->i_mode &= ~0777;
zone->wp = zone->start;
+ zi->i_flags |= ZONEFS_ZONE_OFFLINE;
return 0;
}
+ zi->i_flags |= ZONEFS_ZONE_READONLY;
inode->i_mode &= ~0222;
return i_size_read(inode);
case BLK_ZONE_COND_FULL:
diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
index 4b3de66c3233..1dbe78119ff1 100644
--- a/fs/zonefs/zonefs.h
+++ b/fs/zonefs/zonefs.h
@@ -39,8 +39,10 @@ static inline enum zonefs_ztype zonefs_zone_type(struct blk_zone *zone)
return ZONEFS_ZTYPE_SEQ;
}
-#define ZONEFS_ZONE_OPEN (1 << 0)
-#define ZONEFS_ZONE_ACTIVE (1 << 1)
+#define ZONEFS_ZONE_OPEN (1U << 0)
+#define ZONEFS_ZONE_ACTIVE (1U << 1)
+#define ZONEFS_ZONE_OFFLINE (1U << 2)
+#define ZONEFS_ZONE_READONLY (1U << 3)
/*
* In-memory inode data.
--
2.31.1
The patch titled
Subject: maple_tree: make maple state reusable after mas_empty_area()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-make-maple-state-reusable-after-mas_empty_area.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peng Zhang <zhangpeng.00(a)bytedance.com>
Subject: maple_tree: make maple state reusable after mas_empty_area()
Date: Fri, 5 May 2023 22:58:29 +0800
Make mas->min and mas->max point to a node range instead of a leaf entry
range. This allows mas to still be usable after mas_empty_area() returns.
Users would get unexpected results from other operations on the maple
state after calling the affected function.
For example, x86 MAP_32BIT mmap() acts as if there is no suitable gap when
there should be one.
Link: https://lkml.kernel.org/r/20230505145829.74574-1-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Reported-by: "Edgecombe, Rick P" <rick.p.edgecombe(a)intel.com>
Reported-by: Tad <support(a)spotco.us>
Reported-by: Michael Keyes <mgkeyes(a)vigovproductions.net>
Link: https://lore.kernel.org/linux-mm/32f156ba80010fd97dbaf0a0cdfc84366608624d.c…
Link: https://lore.kernel.org/linux-mm/e6108286ac025c268964a7ead3aab9899f9bc6e9.c…
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
--- a/lib/maple_tree.c~maple_tree-make-maple-state-reusable-after-mas_empty_area
+++ a/lib/maple_tree.c
@@ -5317,15 +5317,9 @@ int mas_empty_area(struct ma_state *mas,
mt = mte_node_type(mas->node);
pivots = ma_pivots(mas_mn(mas), mt);
- if (offset)
- mas->min = pivots[offset - 1] + 1;
-
- if (offset < mt_pivots[mt])
- mas->max = pivots[offset];
-
- if (mas->index < mas->min)
- mas->index = mas->min;
-
+ min = mas_safe_min(mas, pivots, offset);
+ if (mas->index < min)
+ mas->index = min;
mas->last = mas->index + size - 1;
return 0;
}
_
Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are
maple_tree-make-maple-state-reusable-after-mas_empty_area.patch
The patch titled
Subject: zsmalloc: move LRU update from zs_map_object() to zs_malloc()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
zsmalloc-move-lru-update-from-zs_map_object-to-zs_malloc.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Nhat Pham <nphamcs(a)gmail.com>
Subject: zsmalloc: move LRU update from zs_map_object() to zs_malloc()
Date: Fri, 5 May 2023 11:50:54 -0700
Under memory pressure, we sometimes observe the following crash:
[ 5694.832838] ------------[ cut here ]------------
[ 5694.842093] list_del corruption, ffff888014b6a448->next is LIST_POISON1 (dead000000000100)
[ 5694.858677] WARNING: CPU: 33 PID: 418824 at lib/list_debug.c:47 __list_del_entry_valid+0x42/0x80
[ 5694.961820] CPU: 33 PID: 418824 Comm: fuse_counters.s Kdump: loaded Tainted: G S 5.19.0-0_fbk3_rc3_hoangnhatpzsdynshrv41_10870_g85a9558a25de #1
[ 5694.990194] Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM16 05/24/2021
[ 5695.007072] RIP: 0010:__list_del_entry_valid+0x42/0x80
[ 5695.017351] Code: 08 48 83 c2 22 48 39 d0 74 24 48 8b 10 48 39 f2 75 2c 48 8b 51 08 b0 01 48 39 f2 75 34 c3 48 c7 c7 55 d7 78 82 e8 4e 45 3b 00 <0f> 0b eb 31 48 c7 c7 27 a8 70 82 e8 3e 45 3b 00 0f 0b eb 21 48 c7
[ 5695.054919] RSP: 0018:ffffc90027aef4f0 EFLAGS: 00010246
[ 5695.065366] RAX: 41fe484987275300 RBX: ffff888008988180 RCX: 0000000000000000
[ 5695.079636] RDX: ffff88886006c280 RSI: ffff888860060480 RDI: ffff888860060480
[ 5695.093904] RBP: 0000000000000002 R08: 0000000000000000 R09: ffffc90027aef370
[ 5695.108175] R10: 0000000000000000 R11: ffffffff82fdf1c0 R12: 0000000010000002
[ 5695.122447] R13: ffff888014b6a448 R14: ffff888014b6a420 R15: 00000000138dc240
[ 5695.136717] FS: 00007f23a7d3f740(0000) GS:ffff888860040000(0000) knlGS:0000000000000000
[ 5695.152899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5695.164388] CR2: 0000560ceaab6ac0 CR3: 000000001c06c001 CR4: 00000000007706e0
[ 5695.178659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5695.192927] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5695.207197] PKRU: 55555554
[ 5695.212602] Call Trace:
[ 5695.217486] <TASK>
[ 5695.221674] zs_map_object+0x91/0x270
[ 5695.229000] zswap_frontswap_store+0x33d/0x870
[ 5695.237885] ? do_raw_spin_lock+0x5d/0xa0
[ 5695.245899] __frontswap_store+0x51/0xb0
[ 5695.253742] swap_writepage+0x3c/0x60
[ 5695.261063] shrink_page_list+0x738/0x1230
[ 5695.269255] shrink_lruvec+0x5ec/0xcd0
[ 5695.276749] ? shrink_slab+0x187/0x5f0
[ 5695.284240] ? mem_cgroup_iter+0x6e/0x120
[ 5695.292255] shrink_node+0x293/0x7b0
[ 5695.299402] do_try_to_free_pages+0xea/0x550
[ 5695.307940] try_to_free_pages+0x19a/0x490
[ 5695.316126] __folio_alloc+0x19ff/0x3e40
[ 5695.323971] ? __filemap_get_folio+0x8a/0x4e0
[ 5695.332681] ? walk_component+0x2a8/0xb50
[ 5695.340697] ? generic_permission+0xda/0x2a0
[ 5695.349231] ? __filemap_get_folio+0x8a/0x4e0
[ 5695.357940] ? walk_component+0x2a8/0xb50
[ 5695.365955] vma_alloc_folio+0x10e/0x570
[ 5695.373796] ? walk_component+0x52/0xb50
[ 5695.381634] wp_page_copy+0x38c/0xc10
[ 5695.388953] ? filename_lookup+0x378/0xbc0
[ 5695.397140] handle_mm_fault+0x87f/0x1800
[ 5695.405157] do_user_addr_fault+0x1bd/0x570
[ 5695.413520] exc_page_fault+0x5d/0x110
[ 5695.421017] asm_exc_page_fault+0x22/0x30
After some investigation, I have found the following issue: unlike other
zswap backends, zsmalloc performs the LRU list update at the object
mapping time, rather than when the slot for the object is allocated.
This deviation was discussed and agreed upon during the review process
of the zsmalloc writeback patch series:
https://lore.kernel.org/lkml/Y3flcAXNxxrvy3ZH@cmpxchg.org/
Unfortunately, this introduces a subtle bug that occurs when there is a
concurrent store and reclaim, which interleave as follows:
zswap_frontswap_store() shrink_worker()
zs_malloc() zs_zpool_shrink()
spin_lock(&pool->lock) zs_reclaim_page()
zspage = find_get_zspage()
spin_unlock(&pool->lock)
spin_lock(&pool->lock)
zspage = list_first_entry(&pool->lru)
list_del(&zspage->lru)
zspage->lru.next = LIST_POISON1
zspage->lru.prev = LIST_POISON2
spin_unlock(&pool->lock)
zs_map_object()
spin_lock(&pool->lock)
if (!list_empty(&zspage->lru))
list_del(&zspage->lru)
CHECK_DATA_CORRUPTION(next == LIST_POISON1) /* BOOM */
With the current upstream code, this issue rarely happens. zswap only
triggers writeback when the pool is already full, at which point all
further store attempts are short-circuited. This creates an implicit
pseudo-serialization between reclaim and store. I am working on a new
zswap shrinking mechanism, which makes interleaving reclaim and store
more likely, exposing this bug.
zbud and z3fold do not have this problem, because they perform the LRU
list update in the alloc function, while still holding the pool's lock.
This patch fixes the aforementioned bug by moving the LRU update back to
zs_malloc(), analogous to zbud and z3fold.
Link: https://lkml.kernel.org/r/20230505185054.2417128-1-nphamcs@gmail.com
Fixes: 64f768c6b32e ("zsmalloc: add a LRU to zs_pool to keep track of zspages in LRU order")
Signed-off-by: Nhat Pham <nphamcs(a)gmail.com>
Suggested-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Dan Streetman <ddstreet(a)ieee.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Nitin Gupta <ngupta(a)vflare.org>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Seth Jennings <sjenning(a)redhat.com>
Cc: Vitaly Wool <vitaly.wool(a)konsulko.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zsmalloc.c | 36 +++++++++---------------------------
1 file changed, 9 insertions(+), 27 deletions(-)
--- a/mm/zsmalloc.c~zsmalloc-move-lru-update-from-zs_map_object-to-zs_malloc
+++ a/mm/zsmalloc.c
@@ -1331,31 +1331,6 @@ void *zs_map_object(struct zs_pool *pool
obj_to_location(obj, &page, &obj_idx);
zspage = get_zspage(page);
-#ifdef CONFIG_ZPOOL
- /*
- * Move the zspage to front of pool's LRU.
- *
- * Note that this is swap-specific, so by definition there are no ongoing
- * accesses to the memory while the page is swapped out that would make
- * it "hot". A new entry is hot, then ages to the tail until it gets either
- * written back or swaps back in.
- *
- * Furthermore, map is also called during writeback. We must not put an
- * isolated page on the LRU mid-reclaim.
- *
- * As a result, only update the LRU when the page is mapped for write
- * when it's first instantiated.
- *
- * This is a deviation from the other backends, which perform this update
- * in the allocation function (zbud_alloc, z3fold_alloc).
- */
- if (mm == ZS_MM_WO) {
- if (!list_empty(&zspage->lru))
- list_del(&zspage->lru);
- list_add(&zspage->lru, &pool->lru);
- }
-#endif
-
/*
* migration cannot move any zpages in this zspage. Here, pool->lock
* is too heavy since callers would take some time until they calls
@@ -1525,9 +1500,8 @@ unsigned long zs_malloc(struct zs_pool *
fix_fullness_group(class, zspage);
record_obj(handle, obj);
class_stat_inc(class, ZS_OBJS_INUSE, 1);
- spin_unlock(&pool->lock);
- return handle;
+ goto out;
}
spin_unlock(&pool->lock);
@@ -1550,6 +1524,14 @@ unsigned long zs_malloc(struct zs_pool *
/* We completely set up zspage so mark them as movable */
SetZsPageMovable(pool, zspage);
+out:
+#ifdef CONFIG_ZPOOL
+ /* Add/move zspage to beginning of LRU */
+ if (!list_empty(&zspage->lru))
+ list_del(&zspage->lru);
+ list_add(&zspage->lru, &pool->lru);
+#endif
+
spin_unlock(&pool->lock);
return handle;
_
Patches currently in -mm which might be from nphamcs(a)gmail.com are
zsmalloc-move-lru-update-from-zs_map_object-to-zs_malloc.patch
workingset-refactor-lru-refault-to-expose-refault-recency-check.patch
cachestat-implement-cachestat-syscall.patch
selftests-add-selftests-for-cachestat.patch
The self-refresh helper framework overloads "disable" to sometimes mean
"go into self-refresh mode," and this mode activates automatically
(e.g., after some period of unchanging display output). In such cases,
the display pipe is still considered "on", and user-space is not aware
that we went into self-refresh mode. Thus, users may expect that
vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
properly.
However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
vblank enabled.
Add a different expectation: that CRTCs *should* leave vblank enabled
when going into self-refresh.
This patch is preparation for another patch -- "drm/rockchip: vop: Leave
vblank enabled in self-refresh" -- which resolves conflicts between the
above self-refresh behavior and the API tests in IGT's kms_vblank test
module.
== Some alternatives discussed: ==
It's likely that on many display controllers, vblank interrupts will
turn off when the CRTC is disabled, and so in some cases, self-refresh
may not support vblank. To support such cases, we might consider
additions to the generic helpers such that we fire vblank events based
on a timer.
However, there is currently only one driver using the common
self-refresh helpers (i.e., rockchip), and at least as of commit
bed030a49f3e ("drm/rockchip: Don't fully disable vop on self refresh"),
the CRTC hardware is powered enough to continue to generate vblank
interrupts.
So we chose the simpler option of leaving vblank interrupts enabled. We
can reevaluate this decision and perhaps augment the helpers if/when we
gain a second driver that has different requirements.
v3:
* include discussion summary
v2:
* add 'ret != 0' warning case for self-refresh
* describe failing test case and relation to drm/rockchip patch better
Cc: <stable(a)vger.kernel.org> # dependency for "drm/rockchip: vop: Leave
# vblank enabled in self-refresh"
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/gpu/drm/drm_atomic_helper.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index d579fd8f7cb8..a22485e3e924 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1209,7 +1209,16 @@ disable_outputs(struct drm_device *dev, struct drm_atomic_state *old_state)
continue;
ret = drm_crtc_vblank_get(crtc);
- WARN_ONCE(ret != -EINVAL, "driver forgot to call drm_crtc_vblank_off()\n");
+ /*
+ * Self-refresh is not a true "disable"; ensure vblank remains
+ * enabled.
+ */
+ if (new_crtc_state->self_refresh_active)
+ WARN_ONCE(ret != 0,
+ "driver disabled vblank in self-refresh\n");
+ else
+ WARN_ONCE(ret != -EINVAL,
+ "driver forgot to call drm_crtc_vblank_off()\n");
if (ret == 0)
drm_crtc_vblank_put(crtc);
}
--
2.39.0.314.g84b9a713c41-goog
Make mas->min and mas->max point to a node range instead of a leaf entry
range. This allows mas to still be usable after mas_empty_area() returns.
Users would get unexpected results from other operations on the maple
state after calling the affected function.
Reported-by: "Edgecombe, Rick P" <rick.p.edgecombe(a)intel.com>
Reported-by: Tad <support(a)spotco.us>
Reported-by: Michael Keyes <mgkeyes(a)vigovproductions.net>
Link: https://lore.kernel.org/linux-mm/32f156ba80010fd97dbaf0a0cdfc84366608624d.c…
Link: https://lore.kernel.org/linux-mm/e6108286ac025c268964a7ead3aab9899f9bc6e9.c…
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
---
lib/maple_tree.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 110a36479dced..8ebc43d4cc8c5 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5317,15 +5317,9 @@ int mas_empty_area(struct ma_state *mas, unsigned long min,
mt = mte_node_type(mas->node);
pivots = ma_pivots(mas_mn(mas), mt);
- if (offset)
- mas->min = pivots[offset - 1] + 1;
-
- if (offset < mt_pivots[mt])
- mas->max = pivots[offset];
-
- if (mas->index < mas->min)
- mas->index = mas->min;
-
+ min = mas_safe_min(mas, pivots, offset);
+ if (mas->index < min)
+ mas->index = min;
mas->last = mas->index + size - 1;
return 0;
}
--
2.20.1
Do not update the min and max of the maple state to the slot of the leaf
node. Leaving the min and max to the node entry allows for the maple
state to be used in other operations.
Users would get unexpected results from other operations on the maple
state after calling the affected function.
Reported-by: "Edgecombe, Rick P" <rick.p.edgecombe(a)intel.com>
Reported-by: Tad <support(a)spotco.us>
Reported-by: Michael Keyes <mgkeyes(a)vigovproductions.net>
Link: https://lore.kernel.org/linux-mm/32f156ba80010fd97dbaf0a0cdfc84366608624d.c…
Link: https://lore.kernel.org/linux-mm/e6108286ac025c268964a7ead3aab9899f9bc6e9.c…
Fixes: Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
lib/maple_tree.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 110a36479dced..1c4bc7a988ed3 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5285,10 +5285,6 @@ static inline int mas_sparse_area(struct ma_state *mas, unsigned long min,
int mas_empty_area(struct ma_state *mas, unsigned long min,
unsigned long max, unsigned long size)
{
- unsigned char offset;
- unsigned long *pivots;
- enum maple_type mt;
-
if (min >= max)
return -EINVAL;
@@ -5311,18 +5307,9 @@ int mas_empty_area(struct ma_state *mas, unsigned long min,
if (unlikely(mas_is_err(mas)))
return xa_err(mas->node);
- offset = mas->offset;
- if (unlikely(offset == MAPLE_NODE_SLOTS))
+ if (unlikely(mas->offset == MAPLE_NODE_SLOTS))
return -EBUSY;
- mt = mte_node_type(mas->node);
- pivots = ma_pivots(mas_mn(mas), mt);
- if (offset)
- mas->min = pivots[offset - 1] + 1;
-
- if (offset < mt_pivots[mt])
- mas->max = pivots[offset];
-
if (mas->index < mas->min)
mas->index = mas->min;
--
2.39.2