The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 38d11da522aacaa05898c734a1cec86f1e611129
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050716-eggnog-acutely-8d74@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
38d11da522aa ("dm: don't lock fs when the map is NULL in process of resume")
7533afa1d27b ("dm: send just one event on resize, not two")
84010e519f95 ("dm ima: measure data on device remove")
8eb6fab402e2 ("dm ima: measure data on device resume")
91ccbbac1747 ("dm ima: measure data on table load")
89f871af1b26 ("dm: delay registering the gendisk")
ba30585936b0 ("dm: move setting md->type into dm_setup_md_queue")
74a2b6ec9380 ("dm: cleanup cleanup_mapped_device")
2cfa582be800 ("Merge tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 38d11da522aacaa05898c734a1cec86f1e611129 Mon Sep 17 00:00:00 2001
From: Li Lingfeng <lilingfeng3(a)huawei.com>
Date: Tue, 18 Apr 2023 16:38:04 +0800
Subject: [PATCH] dm: don't lock fs when the map is NULL in process of resume
Commit fa247089de99 ("dm: requeue IO if mapping table not yet available")
added a detection of whether the mapping table is available in the IO
submission process. If the mapping table is unavailable, it returns
BLK_STS_RESOURCE and requeues the IO.
This can lead to the following deadlock problem:
dm create mount
ioctl(DM_DEV_CREATE_CMD)
ioctl(DM_TABLE_LOAD_CMD)
do_mount
vfs_get_tree
ext4_get_tree
get_tree_bdev
sget_fc
alloc_super
// got &s->s_umount
down_write_nested(&s->s_umount, ...);
ext4_fill_super
ext4_load_super
ext4_read_bh
submit_bio
// submit and wait io end
ioctl(DM_DEV_SUSPEND_CMD)
dev_suspend
do_resume
dm_suspend
__dm_suspend
lock_fs
freeze_bdev
get_active_super
grab_super
// wait for &s->s_umount
down_write(&s->s_umount);
dm_swap_table
__bind
// set md->map(can't get here)
IO will be continuously requeued while holding the lock since mapping
table is NULL. At the same time, mapping table won't be set since the
lock is not available.
Like request-based DM, bio-based DM also has the same problem.
It's not proper to just abort IO if the mapping table not available.
So clear DM_SKIP_LOCKFS_FLAG when the mapping table is NULL, this
allows the DM table to be loaded and the IO submitted upon resume.
Fixes: fa247089de99 ("dm: requeue IO if mapping table not yet available")
Cc: stable(a)vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 7d5c9c582ed2..cc77cf3d4109 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1168,10 +1168,13 @@ static int do_resume(struct dm_ioctl *param)
/* Do we need to load a new map ? */
if (new_map) {
sector_t old_size, new_size;
+ int srcu_idx;
/* Suspend if it isn't already suspended */
- if (param->flags & DM_SKIP_LOCKFS_FLAG)
+ old_map = dm_get_live_table(md, &srcu_idx);
+ if ((param->flags & DM_SKIP_LOCKFS_FLAG) || !old_map)
suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
+ dm_put_live_table(md, srcu_idx);
if (param->flags & DM_NOFLUSH_FLAG)
suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
if (!dm_suspended_md(md))
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5a2f8d22ace4c8ac8798fab836dca7350fa710b1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050718-verdict-stegosaur-3aaa@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5a2f8d22ace4 ("mm/hugetlb: fix uffd-wp during fork()")
d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
ea4c353df377 ("mm/hugetlb: convert hugetlb_install_page to folios")
0ffdc38eb564 ("mm/hugetlb: convert restore_reserve_on_error() to folios")
e37d3e838d90 ("mm/hugetlb: convert alloc_migrate_huge_page to folios")
ff7d853b0313 ("mm/hugetlb: increase use of folios in alloc_huge_page()")
3a740e8bb56e ("mm/hugetlb: convert alloc_surplus_huge_page() to folios")
a36f1e902474 ("mm/hugetlb: convert dequeue_hugetlb_page functions to folios")
5b4bd90f9ac7 ("rmap: add folio parameter to __page_set_anon_rmap()")
db4e5dbdcdd5 ("mm: use a folio in hugepage_add_anon_rmap() and hugepage_add_new_anon_rmap()")
4d510f3da4c2 ("mm: add folio_add_new_anon_rmap()")
c7c3dec1c9db ("mm: rmap: remove lock_page_memcg()")
19fc1a7e8b2b ("mm/hugetlb: change hugetlb allocation functions to return a folio")
7f325a8d2563 ("mm/hugetlb: convert free_gigantic_page() to folios")
2f6c57d696ab ("mm/hugetlb: convert add_hugetlb_page() to folios and add hugetlb_cma_folio()")
1a7cdab59b22 ("mm/hugetlb: convert dissolve_free_huge_page() to folios")
911565b82853 ("mm/hugetlb: convert destroy_compound_gigantic_page() to folios")
9fd330582b2f ("mm: add folio dtor and order setter functions")
4b51634cd16a ("mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped")
be5ef2d9b006 ("mm,thp,rmap: subpages_mapcount of PTE-mapped subpages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5a2f8d22ace4c8ac8798fab836dca7350fa710b1 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Mon, 17 Apr 2023 15:53:12 -0400
Subject: [PATCH] mm/hugetlb: fix uffd-wp during fork()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Patch series "mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins",
v2.
This patch (of 6):
There're a bunch of things that were wrong:
- Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
rather than huge_pte_uffd_wp().
- When copying over a pte, we should drop uffd-wp bit when
!EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
- When doing early CoW for private hugetlb (e.g. when the parent page was
pinned), uffd-wp bit should be properly carried over if necessary.
No bug reported probably because most people do not even care about these
corner cases, but they are still bugs and can be exposed by the recent unit
tests introduced, so fix all of them in one shot.
Link: https://lkml.kernel.org/r/20230417195317.898696-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230417195317.898696-2-peterx@redhat.com
Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mika Penttilä <mpenttil(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a08fb47fb200..d105b0b6a274 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
static void
hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
- struct folio *new_folio)
+ struct folio *new_folio, pte_t old)
{
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
+
__folio_mark_uptodate(new_folio);
hugepage_add_new_anon_rmap(new_folio, vma, addr);
- set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
+ if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
folio_set_hugetlb_migratable(new_folio);
}
@@ -5032,14 +5036,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*/
;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
- bool uffd_wp = huge_pte_uffd_wp(entry);
-
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_hugetlb_entry_migration(entry))) {
swp_entry_t swp_entry = pte_to_swp_entry(entry);
- bool uffd_wp = huge_pte_uffd_wp(entry);
+ bool uffd_wp = pte_swp_uffd_wp(entry);
if (!is_readable_migration_entry(swp_entry) && cow) {
/*
@@ -5050,10 +5052,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
swp_offset(swp_entry));
entry = swp_entry_to_pte(swp_entry);
if (userfaultfd_wp(src_vma) && uffd_wp)
- entry = huge_pte_mkuffd_wp(entry);
+ entry = pte_swp_mkuffd_wp(entry);
set_huge_pte_at(src, addr, src_pte, entry);
}
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_pte_marker(entry))) {
@@ -5118,7 +5120,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
/* huge_ptep of dst_pte won't change as in child */
goto again;
}
- hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
+ hugetlb_install_folio(dst_vma, dst_pte, addr,
+ new_folio, src_pte_old);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
continue;
@@ -5136,6 +5139,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
entry = huge_pte_wrprotect(entry);
}
+ if (!userfaultfd_wp(dst_vma))
+ entry = huge_pte_clear_uffd_wp(entry);
+
set_huge_pte_at(dst, addr, dst_pte, entry);
hugetlb_count_add(npages, dst);
}
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 5a2f8d22ace4c8ac8798fab836dca7350fa710b1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050717-related-online-6936@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
5a2f8d22ace4 ("mm/hugetlb: fix uffd-wp during fork()")
d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
ea4c353df377 ("mm/hugetlb: convert hugetlb_install_page to folios")
0ffdc38eb564 ("mm/hugetlb: convert restore_reserve_on_error() to folios")
e37d3e838d90 ("mm/hugetlb: convert alloc_migrate_huge_page to folios")
ff7d853b0313 ("mm/hugetlb: increase use of folios in alloc_huge_page()")
3a740e8bb56e ("mm/hugetlb: convert alloc_surplus_huge_page() to folios")
a36f1e902474 ("mm/hugetlb: convert dequeue_hugetlb_page functions to folios")
5b4bd90f9ac7 ("rmap: add folio parameter to __page_set_anon_rmap()")
db4e5dbdcdd5 ("mm: use a folio in hugepage_add_anon_rmap() and hugepage_add_new_anon_rmap()")
4d510f3da4c2 ("mm: add folio_add_new_anon_rmap()")
c7c3dec1c9db ("mm: rmap: remove lock_page_memcg()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5a2f8d22ace4c8ac8798fab836dca7350fa710b1 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Mon, 17 Apr 2023 15:53:12 -0400
Subject: [PATCH] mm/hugetlb: fix uffd-wp during fork()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Patch series "mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins",
v2.
This patch (of 6):
There're a bunch of things that were wrong:
- Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
rather than huge_pte_uffd_wp().
- When copying over a pte, we should drop uffd-wp bit when
!EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
- When doing early CoW for private hugetlb (e.g. when the parent page was
pinned), uffd-wp bit should be properly carried over if necessary.
No bug reported probably because most people do not even care about these
corner cases, but they are still bugs and can be exposed by the recent unit
tests introduced, so fix all of them in one shot.
Link: https://lkml.kernel.org/r/20230417195317.898696-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230417195317.898696-2-peterx@redhat.com
Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mika Penttilä <mpenttil(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a08fb47fb200..d105b0b6a274 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
static void
hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
- struct folio *new_folio)
+ struct folio *new_folio, pte_t old)
{
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
+
__folio_mark_uptodate(new_folio);
hugepage_add_new_anon_rmap(new_folio, vma, addr);
- set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
+ if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
folio_set_hugetlb_migratable(new_folio);
}
@@ -5032,14 +5036,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*/
;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
- bool uffd_wp = huge_pte_uffd_wp(entry);
-
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_hugetlb_entry_migration(entry))) {
swp_entry_t swp_entry = pte_to_swp_entry(entry);
- bool uffd_wp = huge_pte_uffd_wp(entry);
+ bool uffd_wp = pte_swp_uffd_wp(entry);
if (!is_readable_migration_entry(swp_entry) && cow) {
/*
@@ -5050,10 +5052,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
swp_offset(swp_entry));
entry = swp_entry_to_pte(swp_entry);
if (userfaultfd_wp(src_vma) && uffd_wp)
- entry = huge_pte_mkuffd_wp(entry);
+ entry = pte_swp_mkuffd_wp(entry);
set_huge_pte_at(src, addr, src_pte, entry);
}
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_pte_marker(entry))) {
@@ -5118,7 +5120,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
/* huge_ptep of dst_pte won't change as in child */
goto again;
}
- hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
+ hugetlb_install_folio(dst_vma, dst_pte, addr,
+ new_folio, src_pte_old);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
continue;
@@ -5136,6 +5139,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
entry = huge_pte_wrprotect(entry);
}
+ if (!userfaultfd_wp(dst_vma))
+ entry = huge_pte_clear_uffd_wp(entry);
+
set_huge_pte_at(dst, addr, dst_pte, entry);
hugetlb_count_add(npages, dst);
}
The patch below does not apply to the 6.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.3.y
git checkout FETCH_HEAD
git cherry-pick -x f263a7c3a53b99f691b41e4925feb67f2e8544d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050732-spectator-outburst-40e8@gregkh' --subject-prefix 'PATCH 6.3.y' HEAD^..
Possible dependencies:
f263a7c3a53b ("btrfs: reinterpret async discard iops_limit=0 as no delay")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f263a7c3a53b99f691b41e4925feb67f2e8544d0 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Wed, 5 Apr 2023 12:43:59 -0700
Subject: [PATCH] btrfs: reinterpret async discard iops_limit=0 as no delay
Currently, a limit of 0 results in a hard coded metering over 6 hours.
Since the default is a set limit, I suspect no one truly depends on this
rather arbitrary setting. Repurpose it for an arguably more useful
"unlimited" mode, where the delay is 0.
Note that if block groups are too new, or go fully empty, there is still
a delay associated with those conditions. Those delays implement
heuristics for not trimming a region we are relatively likely to fully
overwrite soon.
CC: stable(a)vger.kernel.org # 6.2+
Reviewed-by: Neal Gompa <neal(a)gompa.dev>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 0bc526f5fcd9..a6d77fe41e1a 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -56,8 +56,6 @@
#define BTRFS_DISCARD_DELAY (120ULL * NSEC_PER_SEC)
#define BTRFS_DISCARD_UNUSED_DELAY (10ULL * NSEC_PER_SEC)
-/* Target completion latency of discarding all discardable extents */
-#define BTRFS_DISCARD_TARGET_MSEC (6 * 60 * 60UL * MSEC_PER_SEC)
#define BTRFS_DISCARD_MIN_DELAY_MSEC (1UL)
#define BTRFS_DISCARD_MAX_DELAY_MSEC (1000UL)
#define BTRFS_DISCARD_MAX_IOPS (1000U)
@@ -577,6 +575,7 @@ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
s32 discardable_extents;
s64 discardable_bytes;
u32 iops_limit;
+ unsigned long min_delay = BTRFS_DISCARD_MIN_DELAY_MSEC;
unsigned long delay;
discardable_extents = atomic_read(&discard_ctl->discardable_extents);
@@ -607,13 +606,19 @@ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
}
iops_limit = READ_ONCE(discard_ctl->iops_limit);
- if (iops_limit)
+
+ if (iops_limit) {
delay = MSEC_PER_SEC / iops_limit;
- else
- delay = BTRFS_DISCARD_TARGET_MSEC / discardable_extents;
+ } else {
+ /*
+ * Unset iops_limit means go as fast as possible, so allow a
+ * delay of 0.
+ */
+ delay = 0;
+ min_delay = 0;
+ }
- delay = clamp(delay, BTRFS_DISCARD_MIN_DELAY_MSEC,
- BTRFS_DISCARD_MAX_DELAY_MSEC);
+ delay = clamp(delay, min_delay, BTRFS_DISCARD_MAX_DELAY_MSEC);
discard_ctl->delay_ms = delay;
spin_unlock(&discard_ctl->lock);
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x f263a7c3a53b99f691b41e4925feb67f2e8544d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050731-defensive-scariness-cea7@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
f263a7c3a53b ("btrfs: reinterpret async discard iops_limit=0 as no delay")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f263a7c3a53b99f691b41e4925feb67f2e8544d0 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Wed, 5 Apr 2023 12:43:59 -0700
Subject: [PATCH] btrfs: reinterpret async discard iops_limit=0 as no delay
Currently, a limit of 0 results in a hard coded metering over 6 hours.
Since the default is a set limit, I suspect no one truly depends on this
rather arbitrary setting. Repurpose it for an arguably more useful
"unlimited" mode, where the delay is 0.
Note that if block groups are too new, or go fully empty, there is still
a delay associated with those conditions. Those delays implement
heuristics for not trimming a region we are relatively likely to fully
overwrite soon.
CC: stable(a)vger.kernel.org # 6.2+
Reviewed-by: Neal Gompa <neal(a)gompa.dev>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 0bc526f5fcd9..a6d77fe41e1a 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -56,8 +56,6 @@
#define BTRFS_DISCARD_DELAY (120ULL * NSEC_PER_SEC)
#define BTRFS_DISCARD_UNUSED_DELAY (10ULL * NSEC_PER_SEC)
-/* Target completion latency of discarding all discardable extents */
-#define BTRFS_DISCARD_TARGET_MSEC (6 * 60 * 60UL * MSEC_PER_SEC)
#define BTRFS_DISCARD_MIN_DELAY_MSEC (1UL)
#define BTRFS_DISCARD_MAX_DELAY_MSEC (1000UL)
#define BTRFS_DISCARD_MAX_IOPS (1000U)
@@ -577,6 +575,7 @@ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
s32 discardable_extents;
s64 discardable_bytes;
u32 iops_limit;
+ unsigned long min_delay = BTRFS_DISCARD_MIN_DELAY_MSEC;
unsigned long delay;
discardable_extents = atomic_read(&discard_ctl->discardable_extents);
@@ -607,13 +606,19 @@ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
}
iops_limit = READ_ONCE(discard_ctl->iops_limit);
- if (iops_limit)
+
+ if (iops_limit) {
delay = MSEC_PER_SEC / iops_limit;
- else
- delay = BTRFS_DISCARD_TARGET_MSEC / discardable_extents;
+ } else {
+ /*
+ * Unset iops_limit means go as fast as possible, so allow a
+ * delay of 0.
+ */
+ delay = 0;
+ min_delay = 0;
+ }
- delay = clamp(delay, BTRFS_DISCARD_MIN_DELAY_MSEC,
- BTRFS_DISCARD_MAX_DELAY_MSEC);
+ delay = clamp(delay, min_delay, BTRFS_DISCARD_MAX_DELAY_MSEC);
discard_ctl->delay_ms = delay;
spin_unlock(&discard_ctl->lock);
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x cfe3445a58658b2a142a9ba5f5de69d91c56a297
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050725-customary-casing-68ab@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
cfe3445a5865 ("btrfs: set default discard iops_limit to 1000")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cfe3445a58658b2a142a9ba5f5de69d91c56a297 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Wed, 5 Apr 2023 12:43:58 -0700
Subject: [PATCH] btrfs: set default discard iops_limit to 1000
Previously, the default was a relatively conservative 10. This results
in a 100ms delay, so with ~300 discards in a commit, it takes the full
30s till the next commit to finish the discards. On a workstation, this
results in the disk never going idle, wasting power/battery, etc.
Set the default to 1000, which results in using the smallest possible
delay, currently, which is 1ms. This has shown to not pathologically
keep the disk busy by the original reporter.
Link: https://lore.kernel.org/linux-btrfs/Y%2F+n1wS%2F4XAH7X1p@nz/
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182228
CC: stable(a)vger.kernel.org # 6.2+
Reviewed-by: Neal Gompa <neal(a)gompa.dev
Signed-off-by: Boris Burkov <boris(a)bur.io>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 317aeff6c1da..0bc526f5fcd9 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -60,7 +60,7 @@
#define BTRFS_DISCARD_TARGET_MSEC (6 * 60 * 60UL * MSEC_PER_SEC)
#define BTRFS_DISCARD_MIN_DELAY_MSEC (1UL)
#define BTRFS_DISCARD_MAX_DELAY_MSEC (1000UL)
-#define BTRFS_DISCARD_MAX_IOPS (10U)
+#define BTRFS_DISCARD_MAX_IOPS (1000U)
/* Monotonically decreasing minimum length filters after index 0 */
static int discard_minlen[BTRFS_NR_DISCARD_LISTS] = {
The patch below does not apply to the 6.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.3.y
git checkout FETCH_HEAD
git cherry-pick -x cfe3445a58658b2a142a9ba5f5de69d91c56a297
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050724-striving-matter-4a91@gregkh' --subject-prefix 'PATCH 6.3.y' HEAD^..
Possible dependencies:
cfe3445a5865 ("btrfs: set default discard iops_limit to 1000")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cfe3445a58658b2a142a9ba5f5de69d91c56a297 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Wed, 5 Apr 2023 12:43:58 -0700
Subject: [PATCH] btrfs: set default discard iops_limit to 1000
Previously, the default was a relatively conservative 10. This results
in a 100ms delay, so with ~300 discards in a commit, it takes the full
30s till the next commit to finish the discards. On a workstation, this
results in the disk never going idle, wasting power/battery, etc.
Set the default to 1000, which results in using the smallest possible
delay, currently, which is 1ms. This has shown to not pathologically
keep the disk busy by the original reporter.
Link: https://lore.kernel.org/linux-btrfs/Y%2F+n1wS%2F4XAH7X1p@nz/
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182228
CC: stable(a)vger.kernel.org # 6.2+
Reviewed-by: Neal Gompa <neal(a)gompa.dev
Signed-off-by: Boris Burkov <boris(a)bur.io>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 317aeff6c1da..0bc526f5fcd9 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -60,7 +60,7 @@
#define BTRFS_DISCARD_TARGET_MSEC (6 * 60 * 60UL * MSEC_PER_SEC)
#define BTRFS_DISCARD_MIN_DELAY_MSEC (1UL)
#define BTRFS_DISCARD_MAX_DELAY_MSEC (1000UL)
-#define BTRFS_DISCARD_MAX_IOPS (10U)
+#define BTRFS_DISCARD_MAX_IOPS (1000U)
/* Monotonically decreasing minimum length filters after index 0 */
static int discard_minlen[BTRFS_NR_DISCARD_LISTS] = {