After we enabled mglru on our 384C1536GB production servers, we encountered frequent soft lockups attributed to scanning folios.
The soft lockup as follows,
[Sat Feb 24 02:29:42 2024] watchdog: BUG: soft lockup - CPU#215 stuck for 111s! [kworker/215:0:2200100] [Sat Feb 24 02:29:42 2024] Call Trace: [Sat Feb 24 02:29:42 2024] <IRQ> [Sat Feb 24 02:29:42 2024] ? show_regs.cold+0x1a/0x1f [Sat Feb 24 02:29:42 2024] ? watchdog_timer_fn+0x1c4/0x220 [Sat Feb 24 02:29:42 2024] ? softlockup_fn+0x30/0x30 [Sat Feb 24 02:29:42 2024] ? __hrtimer_run_queues+0xa2/0x2b0 [Sat Feb 24 02:29:42 2024] ? hrtimer_interrupt+0x109/0x220 [Sat Feb 24 02:29:42 2024] ? __sysvec_apic_timer_interrupt+0x5e/0x110 [Sat Feb 24 02:29:42 2024] ? sysvec_apic_timer_interrupt+0x7b/0x90 [Sat Feb 24 02:29:42 2024] </IRQ> [Sat Feb 24 02:29:42 2024] <TASK> [Sat Feb 24 02:29:42 2024] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [Sat Feb 24 02:29:42 2024] ? folio_end_writeback+0x73/0xa0 [Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x8c/0x90 [Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x57/0x90 [Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x8c/0x90 [Sat Feb 24 02:29:42 2024] folio_end_writeback+0x73/0xa0 [Sat Feb 24 02:29:42 2024] iomap_finish_ioend+0x1d4/0x420 [Sat Feb 24 02:29:42 2024] iomap_finish_ioends+0x5e/0xe0 [Sat Feb 24 02:29:42 2024] xfs_end_ioend+0x65/0x150 [xfs] [Sat Feb 24 02:29:42 2024] xfs_end_io+0xbc/0xf0 [xfs] [Sat Feb 24 02:29:42 2024] process_one_work+0x1ec/0x3c0 [Sat Feb 24 02:29:42 2024] worker_thread+0x4d/0x390 [Sat Feb 24 02:29:42 2024] ? process_one_work+0x3c0/0x3c0 [Sat Feb 24 02:29:42 2024] kthread+0xee/0x120 [Sat Feb 24 02:29:42 2024] ? kthread_complete_and_exit+0x20/0x20 [Sat Feb 24 02:29:42 2024] ret_from_fork+0x1f/0x30 [Sat Feb 24 02:29:42 2024] </TASK>
From our analysis of the vmcore generated by the soft lockup, the thread was waiting for the spinlock lruvec->lru_lock:
PID: 2200100 TASK: ffff9a221d8b4000 CPU: 215 COMMAND: "kworker/215:0" #0 [fffffe000319ae20] crash_nmi_callback at ffffffff8e055419 #1 [fffffe000319ae58] nmi_handle at ffffffff8e0253c0 #2 [fffffe000319aea0] default_do_nmi at ffffffff8eae5985 #3 [fffffe000319aec8] exc_nmi at ffffffff8eae5b78 #4 [fffffe000319aef0] end_repeat_nmi at ffffffff8ec015f0 [exception RIP: queued_spin_lock_slowpath+59] RIP: ffffffff8eaf9b8b RSP: ffffb58b01d4fc20 RFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffb58b01d4fc90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff99d2b6ff9050 RBP: ffffb58b01d4fc40 R8: 0000000000035b21 R9: 0000000000000040 R10: 0000000000035b00 R11: 0000000000000001 R12: ffff99d2b6ff9050 R13: 0000000000000046 R14: ffffffff8e28bd30 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffffb58b01d4fc20] queued_spin_lock_slowpath at ffffffff8eaf9b8b #6 [ffffb58b01d4fc48] _raw_spin_lock_irqsave at ffffffff8eaf9b11 #7 [ffffb58b01d4fc68] folio_lruvec_lock_irqsave at ffffffff8e337a82 #8 [ffffb58b01d4fc88] folio_batch_move_lru at ffffffff8e28dbcf #9 [ffffb58b01d4fcd0] folio_batch_add_and_move at ffffffff8e28dce7 #10 [ffffb58b01d4fce0] folio_rotate_reclaimable at ffffffff8e28eee7 #11 [ffffb58b01d4fcf8] folio_end_writeback at ffffffff8e27bfb3 #12 [ffffb58b01d4fd10] iomap_finish_ioend at ffffffff8e3d9d04 #13 [ffffb58b01d4fd98] iomap_finish_ioends at ffffffff8e3d9fae #14 [ffffb58b01d4fde0] xfs_end_ioend at ffffffffc0fae835 [xfs] #15 [ffffb58b01d4fe20] xfs_end_io at ffffffffc0fae9dc [xfs] #16 [ffffb58b01d4fe60] process_one_work at ffffffff8e0ae08c #17 [ffffb58b01d4feb0] worker_thread at ffffffff8e0ae2ad #18 [ffffb58b01d4ff10] kthread at ffffffff8e0b671e #19 [ffffb58b01d4ff50] ret_from_fork at ffffffff8e002dcf
While the spinlock (RDI: ffff99d2b6ff9050) was held by a task which was scanning folios:
PID: 2400713 TASK: ffff996be1d14000 CPU: 50 COMMAND: "chitu_main" --- <NMI exception stack> --- #5 [ffffb58b14ef76e8] __mod_zone_page_state at ffffffff8e2a9c36 #6 [ffffb58b14ef76f0] folio_inc_gen at ffffffff8e2990bd #7 [ffffb58b14ef7740] sort_folio at ffffffff8e29afbb #8 [ffffb58b14ef7748] sysvec_apic_timer_interrupt at ffffffff8eae79f0 #9 [ffffb58b14ef77b0] scan_folios at ffffffff8e29b49b #10 [ffffb58b14ef7878] evict_folios at ffffffff8e29bb53 #11 [ffffb58b14ef7968] lru_gen_shrink_lruvec at ffffffff8e29cb57 #12 [ffffb58b14ef7a28] shrink_lruvec at ffffffff8e29e135 #13 [ffffb58b14ef7af0] shrink_node at ffffffff8e29e78c #14 [ffffb58b14ef7b88] do_try_to_free_pages at ffffffff8e29ec08 #15 [ffffb58b14ef7bf8] try_to_free_mem_cgroup_pages at ffffffff8e2a17a6 #16 [ffffb58b14ef7ca8] try_charge_memcg at ffffffff8e338879 #17 [ffffb58b14ef7d48] charge_memcg at ffffffff8e3394f8 #18 [ffffb58b14ef7d70] __mem_cgroup_charge at ffffffff8e33aded #19 [ffffb58b14ef7d98] do_anonymous_page at ffffffff8e2c6523 #20 [ffffb58b14ef7dd8] __handle_mm_fault at ffffffff8e2cc27d #21 [ffffb58b14ef7e78] handle_mm_fault at ffffffff8e2cc3ba #22 [ffffb58b14ef7eb8] do_user_addr_fault at ffffffff8e073a99 #23 [ffffb58b14ef7f20] exc_page_fault at ffffffff8eae82f7 #24 [ffffb58b14ef7f50] asm_exc_page_fault at ffffffff8ec00bb7
There were a total of 22 tasks waiting for this spinlock (RDI: ffff99d2b6ff9050):
crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l 22
Additionally, two other threads were also engaged in scanning folios, one with 19 waiters and the other with 15 waiters.
To address this issue under heavy reclaim conditions, we introduced a hotfix version of the fix, incorporating cond_resched() in scan_folios(). Following the application of this hotfix to our servers, the soft lockup issue ceased.
Signed-off-by: Yafang Shao laoar.shao@gmail.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org # 6.1+ --- mm/vmscan.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f9c854ce6cc..8f2877285b9a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break; + + spin_unlock_irq(&lruvec->lru_lock); + cond_resched(); + spin_lock_irq(&lruvec->lru_lock); }
if (skipped_zone) {
On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao laoar.shao@gmail.com wrote:
After we enabled mglru on our 384C1536GB production servers, we encountered frequent soft lockups attributed to scanning folios.
The soft lockup as follows,
...
There were a total of 22 tasks waiting for this spinlock (RDI: ffff99d2b6ff9050):
crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l 22
If we're holding the lock for this long then there's a possibility of getting hit by the NMI watchdog also.
Additionally, two other threads were also engaged in scanning folios, one with 19 waiters and the other with 15 waiters.
To address this issue under heavy reclaim conditions, we introduced a hotfix version of the fix, incorporating cond_resched() in scan_folios(). Following the application of this hotfix to our servers, the soft lockup issue ceased.
...
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break;
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
}spin_lock_irq(&lruvec->lru_lock);
Presumably wrapping this with `if (need_resched())' will save some work.
This lock is held for a reason. I'd like to see an analysis of why this change is safe.
On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton akpm@linux-foundation.org wrote:
On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao laoar.shao@gmail.com wrote:
After we enabled mglru on our 384C1536GB production servers, we encountered frequent soft lockups attributed to scanning folios.
The soft lockup as follows,
...
There were a total of 22 tasks waiting for this spinlock (RDI: ffff99d2b6ff9050):
crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l 22
If we're holding the lock for this long then there's a possibility of getting hit by the NMI watchdog also.
The NMI watchdog is disabled as these servers are KVM guest.
kernel.nmi_watchdog = 0 kernel.soft_watchdog = 1
Additionally, two other threads were also engaged in scanning folios, one with 19 waiters and the other with 15 waiters.
To address this issue under heavy reclaim conditions, we introduced a hotfix version of the fix, incorporating cond_resched() in scan_folios(). Following the application of this hotfix to our servers, the soft lockup issue ceased.
...
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break;
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
spin_lock_irq(&lruvec->lru_lock); }
Presumably wrapping this with `if (need_resched())' will save some work.
good suggestion.
This lock is held for a reason. I'd like to see an analysis of why this change is safe.
I believe the key point here is whether we can reduce the scope of this lock from:
evict_folios spin_lock_irq(&lruvec->lru_lock); scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); scanned += try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS) scanned = 0; spin_unlock_irq(&lruvec->lru_lock);
to:
evict_folios spin_lock_irq(&lruvec->lru_lock); scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); spin_unlock_irq(&lruvec->lru_lock);
spin_lock_irq(&lruvec->lru_lock); scanned += try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS) scanned = 0; spin_unlock_irq(&lruvec->lru_lock);
In isolate_folios(), it merely utilizes the min_seq to retrieve the generation without modifying it. If multiple tasks are running evict_folios() concurrently, it seems inconsequential whether min_seq is incremented by one task or another. I'd appreciate Yu's confirmation on this matter.
On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote:
On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton akpm@linux-foundation.org wrote:
On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao laoar.shao@gmail.com wrote:
After we enabled mglru on our 384C1536GB production servers, we encountered frequent soft lockups attributed to scanning folios.
The soft lockup as follows,
...
There were a total of 22 tasks waiting for this spinlock (RDI: ffff99d2b6ff9050):
crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l 22
If we're holding the lock for this long then there's a possibility of getting hit by the NMI watchdog also.
The NMI watchdog is disabled as these servers are KVM guest.
kernel.nmi_watchdog = 0 kernel.soft_watchdog = 1
Additionally, two other threads were also engaged in scanning folios, one with 19 waiters and the other with 15 waiters.
To address this issue under heavy reclaim conditions, we introduced a hotfix version of the fix, incorporating cond_resched() in scan_folios(). Following the application of this hotfix to our servers, the soft lockup issue ceased.
...
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break;
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
spin_lock_irq(&lruvec->lru_lock); }
Presumably wrapping this with `if (need_resched())' will save some work.
good suggestion.
This lock is held for a reason. I'd like to see an analysis of why this change is safe.
I believe the key point here is whether we can reduce the scope of this lock from:
evict_folios spin_lock_irq(&lruvec->lru_lock); scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); scanned += try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS) scanned = 0; spin_unlock_irq(&lruvec->lru_lock);
to:
evict_folios spin_lock_irq(&lruvec->lru_lock); scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); spin_unlock_irq(&lruvec->lru_lock);
spin_lock_irq(&lruvec->lru_lock); scanned += try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS) scanned = 0; spin_unlock_irq(&lruvec->lru_lock);
In isolate_folios(), it merely utilizes the min_seq to retrieve the generation without modifying it. If multiple tasks are running evict_folios() concurrently, it seems inconsequential whether min_seq is incremented by one task or another. I'd appreciate Yu's confirmation on this matter.
Hi Yafang,
Thanks for the patch!
Yes, your second analysis is correct -- we can't just drop the lock as the original patch does because min_seq can be updated in the mean time. If this happens, the gen value becomes invalid, since it's based on the expired min_seq:
sort_folio() { .. gen = lru_gen_from_seq(lrugen->min_seq[type]); .. }
The following might be a better approach (untested):
diff --git a/mm/vmscan.c b/mm/vmscan.c index 4255619a1a31..6fe53cfa8ef8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, skipped_zone += delta; }
- if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) + if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH || + spin_is_contended(&lruvec->lru_lock)) break; }
@@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, skipped += skipped_zone; }
- if (!remaining || isolated >= MIN_LRU_BATCH) + if (!remaining || isolated >= MIN_LRU_BATCH || + (scanned && spin_is_contended(&lruvec->lru_lock))) break; }
On Tue, Mar 12, 2024 at 02:29:48PM -0600, Yu Zhao wrote:
On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote:
On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton akpm@linux-foundation.org wrote:
On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao laoar.shao@gmail.com wrote:
After we enabled mglru on our 384C1536GB production servers, we encountered frequent soft lockups attributed to scanning folios.
The soft lockup as follows,
...
There were a total of 22 tasks waiting for this spinlock (RDI: ffff99d2b6ff9050):
crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l 22
If we're holding the lock for this long then there's a possibility of getting hit by the NMI watchdog also.
The NMI watchdog is disabled as these servers are KVM guest.
kernel.nmi_watchdog = 0 kernel.soft_watchdog = 1
Additionally, two other threads were also engaged in scanning folios, one with 19 waiters and the other with 15 waiters.
To address this issue under heavy reclaim conditions, we introduced a hotfix version of the fix, incorporating cond_resched() in scan_folios(). Following the application of this hotfix to our servers, the soft lockup issue ceased.
...
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break;
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
spin_lock_irq(&lruvec->lru_lock); }
Presumably wrapping this with `if (need_resched())' will save some work.
good suggestion.
This lock is held for a reason. I'd like to see an analysis of why this change is safe.
I believe the key point here is whether we can reduce the scope of this lock from:
evict_folios spin_lock_irq(&lruvec->lru_lock); scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); scanned += try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS) scanned = 0; spin_unlock_irq(&lruvec->lru_lock);
to:
evict_folios spin_lock_irq(&lruvec->lru_lock); scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); spin_unlock_irq(&lruvec->lru_lock);
spin_lock_irq(&lruvec->lru_lock); scanned += try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS) scanned = 0; spin_unlock_irq(&lruvec->lru_lock);
In isolate_folios(), it merely utilizes the min_seq to retrieve the generation without modifying it. If multiple tasks are running evict_folios() concurrently, it seems inconsequential whether min_seq is incremented by one task or another. I'd appreciate Yu's confirmation on this matter.
Hi Yafang,
Thanks for the patch!
Yes, your second analysis is correct -- we can't just drop the lock as the original patch does because min_seq can be updated in the mean time. If this happens, the gen value becomes invalid, since it's based on the expired min_seq:
sort_folio() { .. gen = lru_gen_from_seq(lrugen->min_seq[type]); .. }
The following might be a better approach (untested):
diff --git a/mm/vmscan.c b/mm/vmscan.c index 4255619a1a31..6fe53cfa8ef8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, skipped_zone += delta; }
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH ||
}spin_is_contended(&lruvec->lru_lock)) break;
@@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, skipped += skipped_zone; }
if (!remaining || isolated >= MIN_LRU_BATCH)
if (!remaining || isolated >= MIN_LRU_BATCH ||
}(scanned && spin_is_contended(&lruvec->lru_lock))) break;
A better way might be:
diff --git a/mm/vmscan.c b/mm/vmscan.c index 4255619a1a31..ac59f064c4e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,11 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break; + + if (need_resched() || spin_is_contended(&lruvec->lru_lock)) { + remaining = 0; + break; + } }
if (skipped_zone) {
On Wed, Mar 13, 2024 at 6:12 AM Yu Zhao yuzhao@google.com wrote:
On Tue, Mar 12, 2024 at 02:29:48PM -0600, Yu Zhao wrote:
On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote:
On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton akpm@linux-foundation.org wrote:
On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao laoar.shao@gmail.com wrote:
After we enabled mglru on our 384C1536GB production servers, we encountered frequent soft lockups attributed to scanning folios.
The soft lockup as follows,
...
There were a total of 22 tasks waiting for this spinlock (RDI: ffff99d2b6ff9050):
crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l 22
If we're holding the lock for this long then there's a possibility of getting hit by the NMI watchdog also.
The NMI watchdog is disabled as these servers are KVM guest.
kernel.nmi_watchdog = 0 kernel.soft_watchdog = 1
Additionally, two other threads were also engaged in scanning folios, one with 19 waiters and the other with 15 waiters.
To address this issue under heavy reclaim conditions, we introduced a hotfix version of the fix, incorporating cond_resched() in scan_folios(). Following the application of this hotfix to our servers, the soft lockup issue ceased.
...
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break;
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
spin_lock_irq(&lruvec->lru_lock); }
Presumably wrapping this with `if (need_resched())' will save some work.
good suggestion.
This lock is held for a reason. I'd like to see an analysis of why this change is safe.
I believe the key point here is whether we can reduce the scope of this lock from:
evict_folios spin_lock_irq(&lruvec->lru_lock); scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); scanned += try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS) scanned = 0; spin_unlock_irq(&lruvec->lru_lock);
to:
evict_folios spin_lock_irq(&lruvec->lru_lock); scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); spin_unlock_irq(&lruvec->lru_lock);
spin_lock_irq(&lruvec->lru_lock); scanned += try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS) scanned = 0; spin_unlock_irq(&lruvec->lru_lock);
In isolate_folios(), it merely utilizes the min_seq to retrieve the generation without modifying it. If multiple tasks are running evict_folios() concurrently, it seems inconsequential whether min_seq is incremented by one task or another. I'd appreciate Yu's confirmation on this matter.
Hi Yafang,
Thanks for the patch!
Yes, your second analysis is correct -- we can't just drop the lock as the original patch does because min_seq can be updated in the mean time. If this happens, the gen value becomes invalid, since it's based on the expired min_seq:
sort_folio() { .. gen = lru_gen_from_seq(lrugen->min_seq[type]); .. }
The following might be a better approach (untested):
diff --git a/mm/vmscan.c b/mm/vmscan.c index 4255619a1a31..6fe53cfa8ef8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, skipped_zone += delta; }
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH ||
spin_is_contended(&lruvec->lru_lock)) break; }
@@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, skipped += skipped_zone; }
if (!remaining || isolated >= MIN_LRU_BATCH)
if (!remaining || isolated >= MIN_LRU_BATCH ||
(scanned && spin_is_contended(&lruvec->lru_lock))) break; }
A better way might be:
diff --git a/mm/vmscan.c b/mm/vmscan.c index 4255619a1a31..ac59f064c4e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,11 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break;
if (need_resched() || spin_is_contended(&lruvec->lru_lock)) {
remaining = 0;
break;
} } if (skipped_zone) {
It is better. Thanks for your suggestion. I will verify it on our production servers, which may take several days.
linux-stable-mirror@lists.linaro.org