May 2025 - Linux-stable-mirror

Re: Patch "f2fs: defer readonly check vs norecovery" has been added to the 6.14-stable tree

by Eric Sandeen

On 5/22/25 4:10 PM, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > f2fs: defer readonly check vs norecovery > > to the 6.14-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > f2fs-defer-readonly-check-vs-norecovery.patch > and it can be found in the queue-6.14 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. I already replied to the AUTOSEL email on 5/5 saying that this is not a bug fix and should not be in the stable tree, but here we are. > commit 442e4090bb78d5dce4506a591214ce2447d6ea50 > Author: Eric Sandeen <sandeen(a)redhat.com> > Date: Mon Mar 3 11:12:17 2025 -0600 > > f2fs: defer readonly check vs norecovery > > [ Upstream commit 9cca49875997a1a7e92800a828a62bacb0f577b9 ] > > Defer the readonly-vs-norecovery check until after option parsing is done > so that option parsing does not require an active superblock for the test. > Add a helpful message, while we're at it. > > (I think could be moved back into parsing after we switch to the new mount > API if desired, as the fs context will have RO state available.) > > Signed-off-by: Eric Sandeen <sandeen(a)redhat.com> > Reviewed-by: Chao Yu <chao(a)kernel.org> > Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index b8a0e925a4011..d3b04a589b525 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -728,10 +728,8 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) > set_opt(sbi, DISABLE_ROLL_FORWARD); > break; > case Opt_norecovery: > - /* this option mounts f2fs with ro */ > + /* requires ro mount, checked in f2fs_default_check */ > set_opt(sbi, NORECOVERY); > - if (!f2fs_readonly(sb)) > - return -EINVAL; > break; > case Opt_discard: > if (!f2fs_hw_support_discard(sbi)) { > @@ -1418,6 +1416,12 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) > f2fs_err(sbi, "Allow to mount readonly mode only"); > return -EROFS; > } > + > + if (test_opt(sbi, NORECOVERY) && !f2fs_readonly(sbi->sb)) { > + f2fs_err(sbi, "norecovery requires readonly mount"); > + return -EINVAL; > + } > + > return 0; > } > >

2 weeks, 1 day

1
0
0 0

Re: [PATCH 6.6 005/568] md: fix deadlock between mddev_suspend and flush bio

by Andrew Kanner

> [...] > > Additionally, the only difference between fixing the issue and before is > that there is no return error handling of make_request(). But after > previous patch cleaned md_write_start(), make_requst() only return error > in raid5_make_request() by dm-raid, see commit 41425f96d7aa ("dm-raid456, > md/raid456: fix a deadlock for dm-raid456 while io concurrent with > reshape)". Since dm always splits data and flush operation into two > separate io, io size of flush submitted by dm always is 0, make_request() > will not be called in md_submit_flush_data(). To prevent future > modifications from introducing issues, add WARN_ON to ensure > make_request() no error is returned in this context. > > [...] > @@ -560,8 +552,20 @@ static void md_submit_flush_data(struct work_struct *ws) > bio_endio(bio); > } else { > bio->bi_opf &= ~REQ_PREFLUSH; > - md_handle_request(mddev, bio); > + > + /* > + * make_requst() will never return error here, it only > + * returns error in raid5_make_request() by dm-raid. > + * Since dm always splits data and flush operation into > + * two separate io, io size of flush submitted by dm > + * always is 0, make_request() will not be called here. > + */ > + if (WARN_ON_ONCE(!mddev->pers->make_request(mddev, bio))) > + bio_io_error(bio);; > } Hello, It looks we can hit this WARN_ON_ONCE() after which rootfs is switching to read-only: May 20 15:13:35 hostname kernel: WARNING: CPU: 35 PID: 1517323 at drivers/md/md.c:621 md_submit_flush_data+0x9b/0xe0 ... May 20 15:13:35 hostname kernel: XFS (md125): log I/O error -5 May 20 15:13:35 hostname kernel: XFS (md125): Filesystem has been shut down due to log error (0x2). May 20 15:13:35 hostname kernel: XFS (md125): Please unmount the filesystem and rectify the problem(s). Can you double check if the following regression is actual? Since both stable/linux-6.1.y and stable/linux-6.6.y branches don't have b75197e86e6d ("md: Remove flush handling") there is a minor issue with this backport. Statement "previous patch cleaned md_write_start(), make_requst() only return error in raid5_make_request() by dm-raid" will not work for both branches since 03e792eaf18e ("md: change the return value type of md_write_start to void") was not backported. So we should either backport it, or do error handling, not the WARN_ON_ONCE(). -- Andrew Kanner

2 weeks, 1 day

1
0
0 0

Re: Patch "dm vdo vio-pool: allow variable-sized metadata vios" has been added to the 6.12-stable tree

by Matthew Sakai

On 5/22/25 6:31 PM, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > dm vdo vio-pool: allow variable-sized metadata vios > > to the 6.12-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > dm-vdo-vio-pool-allow-variable-sized-metadata-vios.patch > and it can be found in the queue-6.12 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. There is no reason to pull this patch into 6.12 since there are no features in 6.12 that would use it. Matt > > commit ac663217ac4eeab508db348a67511d45ccd9846f > Author: Ken Raeburn <raeburn(a)redhat.com> > Date: Fri Jan 31 21:18:05 2025 -0500 > > dm vdo vio-pool: allow variable-sized metadata vios > > [ Upstream commit f979da512553a41a657f2c1198277e84d66f8ce3 ] > > With larger-sized metadata vio pools, vdo will sometimes need to > issue I/O with a smaller size than the allocated size. Since > vio_reset_bio is where the bvec array and I/O size are initialized, > this reset interface must now specify what I/O size to use. > > Signed-off-by: Ken Raeburn <raeburn(a)redhat.com> > Signed-off-by: Matthew Sakai <msakai(a)redhat.com> > Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/drivers/md/dm-vdo/io-submitter.c b/drivers/md/dm-vdo/io-submitter.c > index ab62abe18827b..a664be89c15d7 100644 > --- a/drivers/md/dm-vdo/io-submitter.c > +++ b/drivers/md/dm-vdo/io-submitter.c > @@ -327,6 +327,7 @@ void vdo_submit_data_vio(struct data_vio *data_vio) > * @error_handler: the handler for submission or I/O errors (may be NULL) > * @operation: the type of I/O to perform > * @data: the buffer to read or write (may be NULL) > + * @size: the I/O amount in bytes > * > * The vio is enqueued on a vdo bio queue so that bio submission (which may block) does not block > * other vdo threads. > @@ -338,7 +339,7 @@ void vdo_submit_data_vio(struct data_vio *data_vio) > */ > void __submit_metadata_vio(struct vio *vio, physical_block_number_t physical, > bio_end_io_t callback, vdo_action_fn error_handler, > - blk_opf_t operation, char *data) > + blk_opf_t operation, char *data, int size) > { > int result; > struct vdo_completion *completion = &vio->completion; > @@ -349,7 +350,8 @@ void __submit_metadata_vio(struct vio *vio, physical_block_number_t physical, > > vdo_reset_completion(completion); > completion->error_handler = error_handler; > - result = vio_reset_bio(vio, data, callback, operation | REQ_META, physical); > + result = vio_reset_bio_with_size(vio, data, size, callback, operation | REQ_META, > + physical); > if (result != VDO_SUCCESS) { > continue_vio(vio, result); > return; > diff --git a/drivers/md/dm-vdo/io-submitter.h b/drivers/md/dm-vdo/io-submitter.h > index 80748699496f2..3088f11055fdd 100644 > --- a/drivers/md/dm-vdo/io-submitter.h > +++ b/drivers/md/dm-vdo/io-submitter.h > @@ -8,6 +8,7 @@ > > #include <linux/bio.h> > > +#include "constants.h" > #include "types.h" > > struct io_submitter; > @@ -26,14 +27,25 @@ void vdo_submit_data_vio(struct data_vio *data_vio); > > void __submit_metadata_vio(struct vio *vio, physical_block_number_t physical, > bio_end_io_t callback, vdo_action_fn error_handler, > - blk_opf_t operation, char *data); > + blk_opf_t operation, char *data, int size); > > static inline void vdo_submit_metadata_vio(struct vio *vio, physical_block_number_t physical, > bio_end_io_t callback, vdo_action_fn error_handler, > blk_opf_t operation) > { > __submit_metadata_vio(vio, physical, callback, error_handler, > - operation, vio->data); > + operation, vio->data, vio->block_count * VDO_BLOCK_SIZE); > +} > + > +static inline void vdo_submit_metadata_vio_with_size(struct vio *vio, > + physical_block_number_t physical, > + bio_end_io_t callback, > + vdo_action_fn error_handler, > + blk_opf_t operation, > + int size) > +{ > + __submit_metadata_vio(vio, physical, callback, error_handler, > + operation, vio->data, size); > } > > static inline void vdo_submit_flush_vio(struct vio *vio, bio_end_io_t callback, > @@ -41,7 +53,7 @@ static inline void vdo_submit_flush_vio(struct vio *vio, bio_end_io_t callback, > { > /* FIXME: Can we just use REQ_OP_FLUSH? */ > __submit_metadata_vio(vio, 0, callback, error_handler, > - REQ_OP_WRITE | REQ_PREFLUSH, NULL); > + REQ_OP_WRITE | REQ_PREFLUSH, NULL, 0); > } > > #endif /* VDO_IO_SUBMITTER_H */ > diff --git a/drivers/md/dm-vdo/types.h b/drivers/md/dm-vdo/types.h > index dbe892b10f265..cdf36e7d77021 100644 > --- a/drivers/md/dm-vdo/types.h > +++ b/drivers/md/dm-vdo/types.h > @@ -376,6 +376,9 @@ struct vio { > /* The size of this vio in blocks */ > unsigned int block_count; > > + /* The amount of data to be read or written, in bytes */ > + unsigned int io_size; > + > /* The data being read or written. */ > char *data; > > diff --git a/drivers/md/dm-vdo/vio.c b/drivers/md/dm-vdo/vio.c > index b291578f726f5..7c417c1af4516 100644 > --- a/drivers/md/dm-vdo/vio.c > +++ b/drivers/md/dm-vdo/vio.c > @@ -188,14 +188,23 @@ void vdo_set_bio_properties(struct bio *bio, struct vio *vio, bio_end_io_t callb > > /* > * Prepares the bio to perform IO with the specified buffer. May only be used on a VDO-allocated > - * bio, as it assumes the bio wraps a 4k buffer that is 4k aligned, but there does not have to be a > - * vio associated with the bio. > + * bio, as it assumes the bio wraps a 4k-multiple buffer that is 4k aligned, but there does not > + * have to be a vio associated with the bio. > */ > int vio_reset_bio(struct vio *vio, char *data, bio_end_io_t callback, > blk_opf_t bi_opf, physical_block_number_t pbn) > { > - int bvec_count, offset, len, i; > + return vio_reset_bio_with_size(vio, data, vio->block_count * VDO_BLOCK_SIZE, > + callback, bi_opf, pbn); > +} > + > +int vio_reset_bio_with_size(struct vio *vio, char *data, int size, bio_end_io_t callback, > + blk_opf_t bi_opf, physical_block_number_t pbn) > +{ > + int bvec_count, offset, i; > struct bio *bio = vio->bio; > + int vio_size = vio->block_count * VDO_BLOCK_SIZE; > + int remaining; > > bio_reset(bio, bio->bi_bdev, bi_opf); > vdo_set_bio_properties(bio, vio, callback, bi_opf, pbn); > @@ -204,22 +213,21 @@ int vio_reset_bio(struct vio *vio, char *data, bio_end_io_t callback, > > bio->bi_io_vec = bio->bi_inline_vecs; > bio->bi_max_vecs = vio->block_count + 1; > - len = VDO_BLOCK_SIZE * vio->block_count; > + if (VDO_ASSERT(size <= vio_size, "specified size %d is not greater than allocated %d", > + size, vio_size) != VDO_SUCCESS) > + size = vio_size; > + vio->io_size = size; > offset = offset_in_page(data); > - bvec_count = DIV_ROUND_UP(offset + len, PAGE_SIZE); > + bvec_count = DIV_ROUND_UP(offset + size, PAGE_SIZE); > + remaining = size; > > - /* > - * If we knew that data was always on one page, or contiguous pages, we wouldn't need the > - * loop. But if we're using vmalloc, it's not impossible that the data is in different > - * pages that can't be merged in bio_add_page... > - */ > - for (i = 0; (i < bvec_count) && (len > 0); i++) { > + for (i = 0; (i < bvec_count) && (remaining > 0); i++) { > struct page *page; > int bytes_added; > int bytes = PAGE_SIZE - offset; > > - if (bytes > len) > - bytes = len; > + if (bytes > remaining) > + bytes = remaining; > > page = is_vmalloc_addr(data) ? vmalloc_to_page(data) : virt_to_page(data); > bytes_added = bio_add_page(bio, page, bytes, offset); > @@ -231,7 +239,7 @@ int vio_reset_bio(struct vio *vio, char *data, bio_end_io_t callback, > } > > data += bytes; > - len -= bytes; > + remaining -= bytes; > offset = 0; > } > > diff --git a/drivers/md/dm-vdo/vio.h b/drivers/md/dm-vdo/vio.h > index 3490e9f59b04a..74e8fd7c8c029 100644 > --- a/drivers/md/dm-vdo/vio.h > +++ b/drivers/md/dm-vdo/vio.h > @@ -123,6 +123,8 @@ void vdo_set_bio_properties(struct bio *bio, struct vio *vio, bio_end_io_t callb > > int vio_reset_bio(struct vio *vio, char *data, bio_end_io_t callback, > blk_opf_t bi_opf, physical_block_number_t pbn); > +int vio_reset_bio_with_size(struct vio *vio, char *data, int size, bio_end_io_t callback, > + blk_opf_t bi_opf, physical_block_number_t pbn); > > void update_vio_error_stats(struct vio *vio, const char *format, ...) > __printf(2, 3); >

2 weeks, 2 days

1
0
0 0

Re: Patch "btrfs: prevent inline data extents read from touching blocks beyond its range" has been added to the 6.12-stable tree

by Qu Wenruo

在 2025/5/23 07:31, Sasha Levin 写道: > This is a note to let you know that I've just added the patch titled > > btrfs: prevent inline data extents read from touching blocks beyond its range > > to the 6.12-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > btrfs-prevent-inline-data-extents-read-from-touching.patch > and it can be found in the queue-6.12 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. Please drop this one from all stable trees. Although the patch won't cause any behavior change, the main reason for this patch is to prepare for the subpage optimization (and future large folios support). Thanks, Qu > > > > commit 98504dd74a2688ff63dba6bf1d9f8abc7f0b322e > Author: Qu Wenruo <wqu(a)suse.com> > Date: Fri Nov 15 19:15:34 2024 +1030 > > btrfs: prevent inline data extents read from touching blocks beyond its range > > [ Upstream commit 1a5b5668d711d3d1ef447446beab920826decec3 ] > > Currently reading an inline data extent will zero out the remaining > range in the page. > > This is not yet causing problems even for block size < page size > (subpage) cases because: > > 1) An inline data extent always starts at file offset 0 > Meaning at page read, we always read the inline extent first, before > any other blocks in the page. Then later blocks are properly read out > and re-fill the zeroed out ranges. > > 2) Currently btrfs will read out the whole page if a buffered write is > not page aligned > So a page is either fully uptodate at buffered write time (covers the > whole page), or we will read out the whole page first. > Meaning there is nothing to lose for such an inline extent read. > > But it's still not ideal: > > - We're zeroing out the page twice > Once done by read_inline_extent()/uncompress_inline(), once done by > btrfs_do_readpage() for ranges beyond i_size. > > - We're touching blocks that don't belong to the inline extent > In the incoming patches, we can have a partial uptodate folio, of > which some dirty blocks can exist while the page is not fully uptodate: > > The page size is 16K and block size is 4K: > > 0 4K 8K 12K 16K > | | |/////////| | > > And range [8K, 12K) is dirtied by a buffered write, the remaining > blocks are not uptodate. > > If range [0, 4K) contains an inline data extent, and we try to read > the whole page, the current behavior will overwrite range [8K, 12K) > with zero and cause data loss. > > So to make the behavior more consistent and in preparation for future > changes, limit the inline data extents read to only zero out the range > inside the first block, not the whole page. > > Reviewed-by: Filipe Manana <fdmanana(a)suse.com> > Signed-off-by: Qu Wenruo <wqu(a)suse.com> > Signed-off-by: David Sterba <dsterba(a)suse.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 0da2611fb9c85..ee8c18d298758 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6825,6 +6825,7 @@ static noinline int uncompress_inline(struct btrfs_path *path, > { > int ret; > struct extent_buffer *leaf = path->nodes[0]; > + const u32 blocksize = leaf->fs_info->sectorsize; > char *tmp; > size_t max_size; > unsigned long inline_size; > @@ -6841,7 +6842,7 @@ static noinline int uncompress_inline(struct btrfs_path *path, > > read_extent_buffer(leaf, tmp, ptr, inline_size); > > - max_size = min_t(unsigned long, PAGE_SIZE, max_size); > + max_size = min_t(unsigned long, blocksize, max_size); > ret = btrfs_decompress(compress_type, tmp, folio, 0, inline_size, > max_size); > > @@ -6853,8 +6854,8 @@ static noinline int uncompress_inline(struct btrfs_path *path, > * cover that region here. > */ > > - if (max_size < PAGE_SIZE) > - folio_zero_range(folio, max_size, PAGE_SIZE - max_size); > + if (max_size < blocksize) > + folio_zero_range(folio, max_size, blocksize - max_size); > kfree(tmp); > return ret; > } > @@ -6862,6 +6863,7 @@ static noinline int uncompress_inline(struct btrfs_path *path, > static int read_inline_extent(struct btrfs_inode *inode, struct btrfs_path *path, > struct folio *folio) > { > + const u32 blocksize = path->nodes[0]->fs_info->sectorsize; > struct btrfs_file_extent_item *fi; > void *kaddr; > size_t copy_size; > @@ -6876,14 +6878,14 @@ static int read_inline_extent(struct btrfs_inode *inode, struct btrfs_path *path > if (btrfs_file_extent_compression(path->nodes[0], fi) != BTRFS_COMPRESS_NONE) > return uncompress_inline(path, folio, fi); > > - copy_size = min_t(u64, PAGE_SIZE, > + copy_size = min_t(u64, blocksize, > btrfs_file_extent_ram_bytes(path->nodes[0], fi)); > kaddr = kmap_local_folio(folio, 0); > read_extent_buffer(path->nodes[0], kaddr, > btrfs_file_extent_inline_start(fi), copy_size); > kunmap_local(kaddr); > - if (copy_size < PAGE_SIZE) > - folio_zero_range(folio, copy_size, PAGE_SIZE - copy_size); > + if (copy_size < blocksize) > + folio_zero_range(folio, copy_size, blocksize - copy_size); > return 0; > } >

2 weeks, 2 days

1
0
0 0

+ mm-hugetlb-fix-kernel-null-pointer-dereference-when-replacing-free-hugetlb-folios.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-hugetlb-fix-kernel-null-pointer-dereference-when-replacing-free-hugetlb-folios.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ge Yang <yangge1116(a)126.com> Subject: mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios Date: Thu, 22 May 2025 11:22:17 +0800 A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.… Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116(a)126.com> Reviewed-by: Muchun Song <muchun.song(a)linux.dev> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <21cnbao(a)gmail.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/hugetlb.c~mm-hugetlb-fix-kernel-null-pointer-dereference-when-replacing-free-hugetlb-folios +++ a/mm/hugetlb.c @@ -2949,12 +2949,20 @@ int replace_free_hugepage_folios(unsigne while (start_pfn < end_pfn) { folio = pfn_folio(start_pfn); + + /* + * The folio might have been dissolved from under our feet, so make sure + * to carefully check the state under the lock. + */ + spin_lock_irq(&hugetlb_lock); if (folio_test_hugetlb(folio)) { h = folio_hstate(folio); } else { + spin_unlock_irq(&hugetlb_lock); start_pfn++; continue; } + spin_unlock_irq(&hugetlb_lock); if (!folio_ref_count(folio)) { ret = alloc_and_dissolve_hugetlb_folio(h, folio, _ Patches currently in -mm which might be from yangge1116(a)126.com are mm-hugetlb-fix-kernel-null-pointer-dereference-when-replacing-free-hugetlb-folios.patch

2 weeks, 2 days

1
0
0 0

Re: Patch "dm vdo vio-pool: allow variable-sized metadata vios" has been added to the 6.14-stable tree

by Matthew Sakai

On 5/22/25 5:44 PM, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > dm vdo vio-pool: allow variable-sized metadata vios > > to the 6.14-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > dm-vdo-vio-pool-allow-variable-sized-metadata-vios.patch > and it can be found in the queue-6.14 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. > This patch should probably not get added to 6.14 or any earlier stable tree. It implements an interface for a new feature added in in 6.15, but without backporting that feature, nothing will use the new interface so this patch is unnecessary. Thanks, Matt > > commit 921200c5e2f6a07ad93419d800d6a1a2fbf7abc7 > Author: Ken Raeburn <raeburn(a)redhat.com> > Date: Fri Jan 31 21:18:05 2025 -0500 > > dm vdo vio-pool: allow variable-sized metadata vios > > [ Upstream commit f979da512553a41a657f2c1198277e84d66f8ce3 ] > > With larger-sized metadata vio pools, vdo will sometimes need to > issue I/O with a smaller size than the allocated size. Since > vio_reset_bio is where the bvec array and I/O size are initialized, > this reset interface must now specify what I/O size to use. > > Signed-off-by: Ken Raeburn <raeburn(a)redhat.com> > Signed-off-by: Matthew Sakai <msakai(a)redhat.com> > Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/drivers/md/dm-vdo/io-submitter.c b/drivers/md/dm-vdo/io-submitter.c > index 421e5436c32c9..11d47770b54d2 100644 > --- a/drivers/md/dm-vdo/io-submitter.c > +++ b/drivers/md/dm-vdo/io-submitter.c > @@ -327,6 +327,7 @@ void vdo_submit_data_vio(struct data_vio *data_vio) > * @error_handler: the handler for submission or I/O errors (may be NULL) > * @operation: the type of I/O to perform > * @data: the buffer to read or write (may be NULL) > + * @size: the I/O amount in bytes > * > * The vio is enqueued on a vdo bio queue so that bio submission (which may block) does not block > * other vdo threads. > @@ -338,7 +339,7 @@ void vdo_submit_data_vio(struct data_vio *data_vio) > */ > void __submit_metadata_vio(struct vio *vio, physical_block_number_t physical, > bio_end_io_t callback, vdo_action_fn error_handler, > - blk_opf_t operation, char *data) > + blk_opf_t operation, char *data, int size) > { > int result; > struct vdo_completion *completion = &vio->completion; > @@ -349,7 +350,8 @@ void __submit_metadata_vio(struct vio *vio, physical_block_number_t physical, > > vdo_reset_completion(completion); > completion->error_handler = error_handler; > - result = vio_reset_bio(vio, data, callback, operation | REQ_META, physical); > + result = vio_reset_bio_with_size(vio, data, size, callback, operation | REQ_META, > + physical); > if (result != VDO_SUCCESS) { > continue_vio(vio, result); > return; > diff --git a/drivers/md/dm-vdo/io-submitter.h b/drivers/md/dm-vdo/io-submitter.h > index 80748699496f2..3088f11055fdd 100644 > --- a/drivers/md/dm-vdo/io-submitter.h > +++ b/drivers/md/dm-vdo/io-submitter.h > @@ -8,6 +8,7 @@ > > #include <linux/bio.h> > > +#include "constants.h" > #include "types.h" > > struct io_submitter; > @@ -26,14 +27,25 @@ void vdo_submit_data_vio(struct data_vio *data_vio); > > void __submit_metadata_vio(struct vio *vio, physical_block_number_t physical, > bio_end_io_t callback, vdo_action_fn error_handler, > - blk_opf_t operation, char *data); > + blk_opf_t operation, char *data, int size); > > static inline void vdo_submit_metadata_vio(struct vio *vio, physical_block_number_t physical, > bio_end_io_t callback, vdo_action_fn error_handler, > blk_opf_t operation) > { > __submit_metadata_vio(vio, physical, callback, error_handler, > - operation, vio->data); > + operation, vio->data, vio->block_count * VDO_BLOCK_SIZE); > +} > + > +static inline void vdo_submit_metadata_vio_with_size(struct vio *vio, > + physical_block_number_t physical, > + bio_end_io_t callback, > + vdo_action_fn error_handler, > + blk_opf_t operation, > + int size) > +{ > + __submit_metadata_vio(vio, physical, callback, error_handler, > + operation, vio->data, size); > } > > static inline void vdo_submit_flush_vio(struct vio *vio, bio_end_io_t callback, > @@ -41,7 +53,7 @@ static inline void vdo_submit_flush_vio(struct vio *vio, bio_end_io_t callback, > { > /* FIXME: Can we just use REQ_OP_FLUSH? */ > __submit_metadata_vio(vio, 0, callback, error_handler, > - REQ_OP_WRITE | REQ_PREFLUSH, NULL); > + REQ_OP_WRITE | REQ_PREFLUSH, NULL, 0); > } > > #endif /* VDO_IO_SUBMITTER_H */ > diff --git a/drivers/md/dm-vdo/types.h b/drivers/md/dm-vdo/types.h > index dbe892b10f265..cdf36e7d77021 100644 > --- a/drivers/md/dm-vdo/types.h > +++ b/drivers/md/dm-vdo/types.h > @@ -376,6 +376,9 @@ struct vio { > /* The size of this vio in blocks */ > unsigned int block_count; > > + /* The amount of data to be read or written, in bytes */ > + unsigned int io_size; > + > /* The data being read or written. */ > char *data; > > diff --git a/drivers/md/dm-vdo/vio.c b/drivers/md/dm-vdo/vio.c > index e710f3c5a972d..725d87ecf2150 100644 > --- a/drivers/md/dm-vdo/vio.c > +++ b/drivers/md/dm-vdo/vio.c > @@ -188,14 +188,23 @@ void vdo_set_bio_properties(struct bio *bio, struct vio *vio, bio_end_io_t callb > > /* > * Prepares the bio to perform IO with the specified buffer. May only be used on a VDO-allocated > - * bio, as it assumes the bio wraps a 4k buffer that is 4k aligned, but there does not have to be a > - * vio associated with the bio. > + * bio, as it assumes the bio wraps a 4k-multiple buffer that is 4k aligned, but there does not > + * have to be a vio associated with the bio. > */ > int vio_reset_bio(struct vio *vio, char *data, bio_end_io_t callback, > blk_opf_t bi_opf, physical_block_number_t pbn) > { > - int bvec_count, offset, len, i; > + return vio_reset_bio_with_size(vio, data, vio->block_count * VDO_BLOCK_SIZE, > + callback, bi_opf, pbn); > +} > + > +int vio_reset_bio_with_size(struct vio *vio, char *data, int size, bio_end_io_t callback, > + blk_opf_t bi_opf, physical_block_number_t pbn) > +{ > + int bvec_count, offset, i; > struct bio *bio = vio->bio; > + int vio_size = vio->block_count * VDO_BLOCK_SIZE; > + int remaining; > > bio_reset(bio, bio->bi_bdev, bi_opf); > vdo_set_bio_properties(bio, vio, callback, bi_opf, pbn); > @@ -205,22 +214,21 @@ int vio_reset_bio(struct vio *vio, char *data, bio_end_io_t callback, > bio->bi_ioprio = 0; > bio->bi_io_vec = bio->bi_inline_vecs; > bio->bi_max_vecs = vio->block_count + 1; > - len = VDO_BLOCK_SIZE * vio->block_count; > + if (VDO_ASSERT(size <= vio_size, "specified size %d is not greater than allocated %d", > + size, vio_size) != VDO_SUCCESS) > + size = vio_size; > + vio->io_size = size; > offset = offset_in_page(data); > - bvec_count = DIV_ROUND_UP(offset + len, PAGE_SIZE); > + bvec_count = DIV_ROUND_UP(offset + size, PAGE_SIZE); > + remaining = size; > > - /* > - * If we knew that data was always on one page, or contiguous pages, we wouldn't need the > - * loop. But if we're using vmalloc, it's not impossible that the data is in different > - * pages that can't be merged in bio_add_page... > - */ > - for (i = 0; (i < bvec_count) && (len > 0); i++) { > + for (i = 0; (i < bvec_count) && (remaining > 0); i++) { > struct page *page; > int bytes_added; > int bytes = PAGE_SIZE - offset; > > - if (bytes > len) > - bytes = len; > + if (bytes > remaining) > + bytes = remaining; > > page = is_vmalloc_addr(data) ? vmalloc_to_page(data) : virt_to_page(data); > bytes_added = bio_add_page(bio, page, bytes, offset); > @@ -232,7 +240,7 @@ int vio_reset_bio(struct vio *vio, char *data, bio_end_io_t callback, > } > > data += bytes; > - len -= bytes; > + remaining -= bytes; > offset = 0; > } > > diff --git a/drivers/md/dm-vdo/vio.h b/drivers/md/dm-vdo/vio.h > index 3490e9f59b04a..74e8fd7c8c029 100644 > --- a/drivers/md/dm-vdo/vio.h > +++ b/drivers/md/dm-vdo/vio.h > @@ -123,6 +123,8 @@ void vdo_set_bio_properties(struct bio *bio, struct vio *vio, bio_end_io_t callb > > int vio_reset_bio(struct vio *vio, char *data, bio_end_io_t callback, > blk_opf_t bi_opf, physical_block_number_t pbn); > +int vio_reset_bio_with_size(struct vio *vio, char *data, int size, bio_end_io_t callback, > + blk_opf_t bi_opf, physical_block_number_t pbn); > > void update_vio_error_stats(struct vio *vio, const char *format, ...) > __printf(2, 3); >

2 weeks, 2 days

1
0
0 0

+ mm-swap-fix-potensial-buffer-overflow-in-setup_clusters.patch added to mm-new branch

by Andrew Morton

The patch titled Subject: mm: swap: fix potensial buffer overflow in setup_clusters() has been added to the -mm mm-new branch. Its filename is mm-swap-fix-potensial-buffer-overflow-in-setup_clusters.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-new branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Note, mm-new is a provisional staging ground for work-in-progress patches, and acceptance into mm-new is a notification for others take notice and to finish up reviews. Please do not hesitate to respond to review feedback and post updated versions to replace or incrementally fixup patches in mm-new. Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Kemeng Shi <shikemeng(a)huaweicloud.com> Subject: mm: swap: fix potensial buffer overflow in setup_clusters() Date: Thu, 22 May 2025 20:25:53 +0800 In setup_swap_map(), we only ensure badpages are in range (0, last_page]. As maxpages might be < last_page, setup_clusters() will encounter a buffer overflow when a badpage is >= maxpages. Only call inc_cluster_info_page() for badpage which is < maxpages to fix the issue. Link: https://lkml.kernel.org/r/20250522122554.12209-4-shikemeng@huaweicloud.com Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") Signed-off-by: Kemeng Shi <shikemeng(a)huaweicloud.com> Cc: <stable(a)vger.kernel.org> Cc: Baoquan He <bhe(a)redhat.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kairui Song <kasong(a)tencent.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/swapfile.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) --- a/mm/swapfile.c~mm-swap-fix-potensial-buffer-overflow-in-setup_clusters +++ a/mm/swapfile.c @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_c * and the EOF part of the last cluster. */ inc_cluster_info_page(si, cluster_info, 0); - for (i = 0; i < swap_header->info.nr_badpages; i++) - inc_cluster_info_page(si, cluster_info, - swap_header->info.badpages[i]); + for (i = 0; i < swap_header->info.nr_badpages; i++) { + unsigned int page_nr = swap_header->info.badpages[i]; + + if (page_nr >= maxpages) + continue; + inc_cluster_info_page(si, cluster_info, page_nr); + } for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) inc_cluster_info_page(si, cluster_info, i); _ Patches currently in -mm which might be from shikemeng(a)huaweicloud.com are mm-shmem-avoid-unpaired-folio_unlock-in-shmem_swapin_folio.patch mm-shmem-add-missing-shmem_unacct_size-in-__shmem_file_setup.patch mm-shmem-fix-potential-dead-loop-in-shmem_unuse.patch mm-shmem-only-remove-inode-from-swaplist-when-its-swapped-page-count-is-0.patch mm-shmem-remove-unneeded-xa_is_value-check-in-shmem_unuse_swap_entries.patch mm-swap-move-nr_swap_pages-counter-decrement-from-folio_alloc_swap-to-swap_range_alloc.patch mm-swap-correctly-use-maxpages-in-swapon-syscall-to-avoid-potensial-deadloop.patch mm-swap-fix-potensial-buffer-overflow-in-setup_clusters.patch mm-swap-remove-stale-comment-stale-comment-in-cluster_alloc_swap_entry.patch

2 weeks, 2 days

1
0
0 0

+ mm-swap-correctly-use-maxpages-in-swapon-syscall-to-avoid-potensial-deadloop.patch added to mm-new branch

by Andrew Morton

The patch titled Subject: mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop has been added to the -mm mm-new branch. Its filename is mm-swap-correctly-use-maxpages-in-swapon-syscall-to-avoid-potensial-deadloop.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-new branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Note, mm-new is a provisional staging ground for work-in-progress patches, and acceptance into mm-new is a notification for others take notice and to finish up reviews. Please do not hesitate to respond to review feedback and post updated versions to replace or incrementally fixup patches in mm-new. Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Kemeng Shi <shikemeng(a)huaweicloud.com> Subject: mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Date: Thu, 22 May 2025 20:25:52 +0800 We use maxpages from read_swap_header() to initialize swap_info_struct, however the maxpages might be reduced in setup_swap_extents() and the si->max is assigned with the reduced maxpages from the setup_swap_extents(). Obviously, this could lead to memory waste as we allocated memory based on larger maxpages, besides, this could lead to a potensial deadloop as following: 1) When calling setup_clusters() with larger maxpages, unavailable pages within range [si->max, larger maxpages) are not accounted with inc_cluster_info_page(). As a result, these pages are assumed available but can not be allocated. The cluster contains these pages can be moved to frag_clusters list after it's all available pages were allocated. 2) When the cluster mentioned in 1) is the only cluster in frag_clusters list, cluster_alloc_swap_entry() assume order 0 allocation will never failed and will enter a deadloop by keep trying to allocate page from the only cluster in frag_clusters which contains no actually available page. Call setup_swap_extents() to get the final maxpages before swap_info_struct initialization to fix the issue. Link: https://lkml.kernel.org/r/20250522122554.12209-3-shikemeng@huaweicloud.com Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") Signed-off-by: Kemeng Shi <shikemeng(a)huaweicloud.com> Cc: <stable(a)vger.kernel.org> Cc: Baoquan He <bhe(a)redhat.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kairui Song <kasong(a)tencent.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- 1 file changed, 20 insertions(+), 27 deletions(-) --- a/mm/swapfile.c~mm-swap-correctly-use-maxpages-in-swapon-syscall-to-avoid-potensial-deadloop +++ a/mm/swapfile.c @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(st return maxpages; } -static int setup_swap_map_and_extents(struct swap_info_struct *si, - union swap_header *swap_header, - unsigned char *swap_map, - unsigned long maxpages, - sector_t *span) +static int setup_swap_map(struct swap_info_struct *si, + union swap_header *swap_header, + unsigned char *swap_map, + unsigned long maxpages) { - unsigned int nr_good_pages; unsigned long i; - int nr_extents; - - nr_good_pages = maxpages - 1; /* omit header page */ + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ for (i = 0; i < swap_header->info.nr_badpages; i++) { unsigned int page_nr = swap_header->info.badpages[i]; if (page_nr == 0 || page_nr > swap_header->info.last_page) return -EINVAL; if (page_nr < maxpages) { swap_map[page_nr] = SWAP_MAP_BAD; - nr_good_pages--; + si->pages--; } } - if (nr_good_pages) { - swap_map[0] = SWAP_MAP_BAD; - si->max = maxpages; - si->pages = nr_good_pages; - nr_extents = setup_swap_extents(si, span); - if (nr_extents < 0) - return nr_extents; - nr_good_pages = si->pages; - } - if (!nr_good_pages) { + if (!si->pages) { pr_warn("Empty swap-file\n"); return -EINVAL; } - return nr_extents; + return 0; } #define SWAP_CLUSTER_INFO_COLS \ @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_c * Mark unusable pages as unavailable. The clusters aren't * marked free yet, so no list operations are involved yet. * - * See setup_swap_map_and_extents(): header page, bad pages, + * See setup_swap_map(): header page, bad pages, * and the EOF part of the last cluster. */ inc_cluster_info_page(si, cluster_info, 0); @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __use goto bad_swap_unlock_inode; } + si->max = maxpages; + si->pages = maxpages - 1; + nr_extents = setup_swap_extents(si, &span); + if (nr_extents < 0) { + error = nr_extents; + goto bad_swap_unlock_inode; + } + maxpages = si->max; + /* OK, set up the swap map and apply the bad block list */ swap_map = vzalloc(maxpages); if (!swap_map) { @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __use if (error) goto bad_swap_unlock_inode; - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, - maxpages, &span); - if (unlikely(nr_extents < 0)) { - error = nr_extents; + error = setup_swap_map(si, swap_header, swap_map, maxpages); + if (error) goto bad_swap_unlock_inode; - } /* * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might _ Patches currently in -mm which might be from shikemeng(a)huaweicloud.com are mm-shmem-avoid-unpaired-folio_unlock-in-shmem_swapin_folio.patch mm-shmem-add-missing-shmem_unacct_size-in-__shmem_file_setup.patch mm-shmem-fix-potential-dead-loop-in-shmem_unuse.patch mm-shmem-only-remove-inode-from-swaplist-when-its-swapped-page-count-is-0.patch mm-shmem-remove-unneeded-xa_is_value-check-in-shmem_unuse_swap_entries.patch mm-swap-move-nr_swap_pages-counter-decrement-from-folio_alloc_swap-to-swap_range_alloc.patch mm-swap-correctly-use-maxpages-in-swapon-syscall-to-avoid-potensial-deadloop.patch mm-swap-fix-potensial-buffer-overflow-in-setup_clusters.patch mm-swap-remove-stale-comment-stale-comment-in-cluster_alloc_swap_entry.patch

2 weeks, 2 days

1
0
0 0

+ mm-swap-move-nr_swap_pages-counter-decrement-from-folio_alloc_swap-to-swap_range_alloc.patch added to mm-new branch

by Andrew Morton

The patch titled Subject: mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() has been added to the -mm mm-new branch. Its filename is mm-swap-move-nr_swap_pages-counter-decrement-from-folio_alloc_swap-to-swap_range_alloc.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-new branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Note, mm-new is a provisional staging ground for work-in-progress patches, and acceptance into mm-new is a notification for others take notice and to finish up reviews. Please do not hesitate to respond to review feedback and post updated versions to replace or incrementally fixup patches in mm-new. Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Kemeng Shi <shikemeng(a)huaweicloud.com> Subject: mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Date: Thu, 22 May 2025 20:25:51 +0800 Patch series "Some randome fixes and cleanups to swapfile". Patch 0-3 are some random fixes. Patch 4 is a cleanup. More details can be found in respective patches. This patch (of 4): When folio_alloc_swap() encounters a failure in either mem_cgroup_try_charge_swap() or add_to_swap_cache(), nr_swap_pages counter is not decremented for allocated entry. However, the following put_swap_folio() will increase nr_swap_pages counter unpairly and lead to an imbalance. Move nr_swap_pages decrement from folio_alloc_swap() to swap_range_alloc() to pair the nr_swap_pages counting. Link: https://lkml.kernel.org/r/20250522122554.12209-1-shikemeng@huaweicloud.com Link: https://lkml.kernel.org/r/20250522122554.12209-2-shikemeng@huaweicloud.com Fixes: 0ff67f990bd45 ("mm, swap: remove swap slot cache") Signed-off-by: Kemeng Shi <shikemeng(a)huaweicloud.com> Reviewed-by: Kairui Song <kasong(a)tencent.com> Cc: Baoquan He <bhe(a)redhat.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/swapfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/swapfile.c~mm-swap-move-nr_swap_pages-counter-decrement-from-folio_alloc_swap-to-swap_range_alloc +++ a/mm/swapfile.c @@ -1115,6 +1115,7 @@ static void swap_range_alloc(struct swap if (vm_swap_full()) schedule_work(&si->reclaim_work); } + atomic_long_sub(nr_entries, &nr_swap_pages); } static void swap_range_free(struct swap_info_struct *si, unsigned long offset, @@ -1313,7 +1314,6 @@ int folio_alloc_swap(struct folio *folio if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) goto out_free; - atomic_long_sub(size, &nr_swap_pages); return 0; out_free: _ Patches currently in -mm which might be from shikemeng(a)huaweicloud.com are mm-shmem-avoid-unpaired-folio_unlock-in-shmem_swapin_folio.patch mm-shmem-add-missing-shmem_unacct_size-in-__shmem_file_setup.patch mm-shmem-fix-potential-dead-loop-in-shmem_unuse.patch mm-shmem-only-remove-inode-from-swaplist-when-its-swapped-page-count-is-0.patch mm-shmem-remove-unneeded-xa_is_value-check-in-shmem_unuse_swap_entries.patch mm-swap-move-nr_swap_pages-counter-decrement-from-folio_alloc_swap-to-swap_range_alloc.patch mm-swap-correctly-use-maxpages-in-swapon-syscall-to-avoid-potensial-deadloop.patch mm-swap-fix-potensial-buffer-overflow-in-setup_clusters.patch mm-swap-remove-stale-comment-stale-comment-in-cluster_alloc_swap_entry.patch

2 weeks, 2 days

1
0
0 0

Re: Patch "btrfs: prevent inline data extents read from touching blocks beyond its range" has been added to the 6.14-stable tree

by Qu Wenruo

在 2025/5/23 06:35, Sasha Levin 写道: > This is a note to let you know that I've just added the patch titled > > btrfs: prevent inline data extents read from touching blocks beyond its range > > to the 6.14-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > btrfs-prevent-inline-data-extents-read-from-touching.patch > and it can be found in the queue-6.14 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. Please drop this patch. This is again a preparation patch for larger folios support of btrfs, and the optimization to dirty a block without reading the full page. This patch alone doesn't cause any difference for older kernels and should not be backported. Thanks, Qu > > > > commit 6a2d904623a8d1711b6b5065845d52cb3f2be60a > Author: Qu Wenruo <wqu(a)suse.com> > Date: Fri Nov 15 19:15:34 2024 +1030 > > btrfs: prevent inline data extents read from touching blocks beyond its range > > [ Upstream commit 1a5b5668d711d3d1ef447446beab920826decec3 ] > > Currently reading an inline data extent will zero out the remaining > range in the page. > > This is not yet causing problems even for block size < page size > (subpage) cases because: > > 1) An inline data extent always starts at file offset 0 > Meaning at page read, we always read the inline extent first, before > any other blocks in the page. Then later blocks are properly read out > and re-fill the zeroed out ranges. > > 2) Currently btrfs will read out the whole page if a buffered write is > not page aligned > So a page is either fully uptodate at buffered write time (covers the > whole page), or we will read out the whole page first. > Meaning there is nothing to lose for such an inline extent read. > > But it's still not ideal: > > - We're zeroing out the page twice > Once done by read_inline_extent()/uncompress_inline(), once done by > btrfs_do_readpage() for ranges beyond i_size. > > - We're touching blocks that don't belong to the inline extent > In the incoming patches, we can have a partial uptodate folio, of > which some dirty blocks can exist while the page is not fully uptodate: > > The page size is 16K and block size is 4K: > > 0 4K 8K 12K 16K > | | |/////////| | > > And range [8K, 12K) is dirtied by a buffered write, the remaining > blocks are not uptodate. > > If range [0, 4K) contains an inline data extent, and we try to read > the whole page, the current behavior will overwrite range [8K, 12K) > with zero and cause data loss. > > So to make the behavior more consistent and in preparation for future > changes, limit the inline data extents read to only zero out the range > inside the first block, not the whole page. > > Reviewed-by: Filipe Manana <fdmanana(a)suse.com> > Signed-off-by: Qu Wenruo <wqu(a)suse.com> > Signed-off-by: David Sterba <dsterba(a)suse.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 9a648fb130230..a7136311a13c6 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6779,6 +6779,7 @@ static noinline int uncompress_inline(struct btrfs_path *path, > { > int ret; > struct extent_buffer *leaf = path->nodes[0]; > + const u32 blocksize = leaf->fs_info->sectorsize; > char *tmp; > size_t max_size; > unsigned long inline_size; > @@ -6795,7 +6796,7 @@ static noinline int uncompress_inline(struct btrfs_path *path, > > read_extent_buffer(leaf, tmp, ptr, inline_size); > > - max_size = min_t(unsigned long, PAGE_SIZE, max_size); > + max_size = min_t(unsigned long, blocksize, max_size); > ret = btrfs_decompress(compress_type, tmp, folio, 0, inline_size, > max_size); > > @@ -6807,14 +6808,15 @@ static noinline int uncompress_inline(struct btrfs_path *path, > * cover that region here. > */ > > - if (max_size < PAGE_SIZE) > - folio_zero_range(folio, max_size, PAGE_SIZE - max_size); > + if (max_size < blocksize) > + folio_zero_range(folio, max_size, blocksize - max_size); > kfree(tmp); > return ret; > } > > static int read_inline_extent(struct btrfs_path *path, struct folio *folio) > { > + const u32 blocksize = path->nodes[0]->fs_info->sectorsize; > struct btrfs_file_extent_item *fi; > void *kaddr; > size_t copy_size; > @@ -6829,14 +6831,14 @@ static int read_inline_extent(struct btrfs_path *path, struct folio *folio) > if (btrfs_file_extent_compression(path->nodes[0], fi) != BTRFS_COMPRESS_NONE) > return uncompress_inline(path, folio, fi); > > - copy_size = min_t(u64, PAGE_SIZE, > + copy_size = min_t(u64, blocksize, > btrfs_file_extent_ram_bytes(path->nodes[0], fi)); > kaddr = kmap_local_folio(folio, 0); > read_extent_buffer(path->nodes[0], kaddr, > btrfs_file_extent_inline_start(fi), copy_size); > kunmap_local(kaddr); > - if (copy_size < PAGE_SIZE) > - folio_zero_range(folio, copy_size, PAGE_SIZE - copy_size); > + if (copy_size < blocksize) > + folio_zero_range(folio, copy_size, blocksize - copy_size); > return 0; > } >

2 weeks, 2 days

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror May 2025