This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
unmounting an ocfs2 volume").
In commit dfe6c5692fb5, the commit log "This bug has existed since the
initial OCFS2 code." is wrong. The correct introduction commit is
30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
The influence of commit dfe6c5692fb5 is that it provides a correct
fix for the latest kernel. however, it shouldn't be pushed to stable
branches. Let's use this commit to revert all branches that include
dfe6c5692fb5 and use a new fix method to fix commit 30dd3478c3cd.
Fixes: dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume")
Signed-off-by: Heming Zhao <heming.zhao(a)suse.com>
Cc: <stable(a)vger.kernel.org>
---
fs/ocfs2/localalloc.c | 19 -------------------
1 file changed, 19 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 8ac42ea81a17..5df34561c551 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -1002,25 +1002,6 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
start = bit_off + 1;
}
- /* clear the contiguous bits until the end boundary */
- if (count) {
- blkno = la_start_blk +
- ocfs2_clusters_to_blocks(osb->sb,
- start - count);
-
- trace_ocfs2_sync_local_to_main_free(
- count, start - count,
- (unsigned long long)la_start_blk,
- (unsigned long long)blkno);
-
- status = ocfs2_release_clusters(handle,
- main_bm_inode,
- main_bm_bh, blkno,
- count);
- if (status < 0)
- mlog_errno(status);
- }
-
bail:
if (status)
mlog_errno(status);
--
2.43.0
The function do_otp_read() does not set the output parameter *retlen,
which is expected to contain the number of bytes actually read.
As a result, in onenand_otp_walk(), the tmp_retlen variable remains
uninitialized after calling do_otp_walk() and used to change
the values of the buf, len and retlen variables.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 49dc08eeda70 ("[MTD] [OneNAND] fix numerous races")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ivan Stepchenko <sid(a)itb.spb.ru>
---
drivers/mtd/nand/onenand/onenand_base.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/mtd/nand/onenand/onenand_base.c b/drivers/mtd/nand/onenand/onenand_base.c
index f66385faf631..0dc2ea4fc857 100644
--- a/drivers/mtd/nand/onenand/onenand_base.c
+++ b/drivers/mtd/nand/onenand/onenand_base.c
@@ -2923,6 +2923,7 @@ static int do_otp_read(struct mtd_info *mtd, loff_t from, size_t len,
ret = ONENAND_IS_4KB_PAGE(this) ?
onenand_mlc_read_ops_nolock(mtd, from, &ops) :
onenand_read_ops_nolock(mtd, from, &ops);
+ *retlen = ops.retlen;
/* Exit OTP access mode */
this->command(mtd, ONENAND_CMD_RESET, 0, 0);
--
2.34.1
On Thu, Dec 5, 2024 at 6:42 PM Christian Göttsche
<cgzones(a)googlemail.com> wrote:
>
> Dec 5, 2024 02:09:39 Thiébaud Weksteen <tweek(a)google.com>:
>
> > When evaluating extended permissions, ignore unknown permissions instead
> > of calling BUG(). This commit ensures that future permissions can be
> > added without interfering with older kernels.
> >
> > Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
> > Cc: stable(a)vger.kernel.org
> > Signed-off-by: Thiébaud Weksteen <tweek(a)google.com>
> > - BUG();
> > + pr_warn_once(
> > + "SELinux: unknown extended permission (%u) will be ignored\n",
> > + node->datum.u.xperms->specified);
> > + return;
> > }
>
> What about instead of logging once per boot at access decision time logging once per policyload at parse time, like suggested for patch https://patchwork.kernel.org/project/selinux/patch/20241115133619.114393-11… ?
>
I agree, warning when the policy is loaded makes more sense. For this
particular bug, I am trying to keep the patch to a bare minimum as I
intend to backport it to stable kernels (on Android, this is
preventing us from deploying a policy compatible with both older and
newer kernels). Maybe we could land the first version of this patch
(without any warning message), with the understanding that your patch
will land soon after?
在 2024/12/4 19:01, Matthieu Baerts 写道:
> Hi MoYuanhao,
>
> +Cc MPTCP mailing list.
>
> (Please cc the MPTCP list next time)
>
> On 04/12/2024 09:58, MoYuanhao wrote:
>> Ensure enough space before adding MPTCP options in tcp_syn_options()
>> Added a check to verify sufficient remaining space
>> before inserting MPTCP options in SYN packets.
>> This prevents issues when space is insufficient.
>
> Thank you for this patch. I'm surprised we all missed this check, but
> yes it is missing.
>
> As mentioned by Eric in his previous email, please add a 'Fixes' tag.
> For bug-fixes, you should also Cc stable and target 'net', not 'net-next':
>
> Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing
> connections")
> Cc: stable(a)vger.kernel.org
>
>
> Regarding the code, it looks OK to me, as we did exactly that with
> mptcp_synack_options(). In mptcp_established_options(), we pass
> 'remaining' because many MPTCP options can be set, but not here. So I
> guess that's fine to keep the code like that, especially for the 'net' tree.
>
>
> Also, and linked to Eric's email, did you have an issue with that, or is
> it to prevent issues in the future?
>
>
> One last thing, please don’t repost your patches within one 24h period, see:
>
> https://docs.kernel.org/process/maintainer-netdev.html
>
>
> Because the code is OK to me, and the same patch has already been sent
> twice to the netdev ML within a few hours, I'm going to apply this patch
> in our MPTCP tree with the suggested modifications. Later on, we will
> send it for inclusion in the net tree.
>
> pw-bot: awaiting-upstream
>
> (Not sure this pw-bot instruction will work as no net/mptcp/* files have
> been modified)
>
> Cheers,
> Matt
Hi Matt,
Thank you for your feedback!
I have made the suggested updates to the patch (version 2):
I’ve added the Fixes tag and Cc'd the stable(a)vger.kernel.org list.
The target branch has been adjusted to net as per your suggestion.
I will make sure to Cc the MPTCP list in future submissions.
Regarding your question, this patch was created to prevent potential
issues related to insufficient space for MPTCP options in the future. I
didn't encounter a specific issue, but it seemed like a necessary
safeguard to ensure robustness when handling SYN packets with MPTCP options.
Additionally, I have made further optimizations to the patch, which are
included in the attached version. I believe it would be more elegant to
introduce a new function, mptcp_set_option(), similar to
mptcp_set_option_cond(), to handle MPTCP options.
This is my first time replying to a message in a Linux mailing list, so
if there are any formatting issues or mistakes, please point them out
and I will make sure to correct them in future submissions.
Thanks again for your review and suggestions. Looking forward to seeing
the patch applied to the MPTCP tree and later inclusion in the net tree.
Best regards,
MoYuanhao
From: Kairui Song <kasong(a)tencent.com>
Setting a zero sized block device as backing device is pointless, and
one can easily create a recursive loop by setting the uninitialized
ZRAM device itself as its own backing device by (zram0 is uninitialized):
echo /dev/zram0 > /sys/block/zram0/backing_dev
It's definitely a wrong config, and the module will pin itself,
kernel should refuse doing so in the first place.
By refusing to use zero sized device we avoided misuse cases
including this one above.
Fixes: 013bf95a83ec ("zram: add interface to specif backing device")
Reported-by: Desheng Wu <deshengwu(a)tencent.com>
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Cc: stable(a)vger.kernel.org
---
drivers/block/zram/zram_drv.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0ca6d55c9917..dd48df5b97c8 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -614,6 +614,12 @@ static ssize_t backing_dev_store(struct device *dev,
}
nr_pages = i_size_read(inode) >> PAGE_SHIFT;
+ /* Refuse to use zero sized device (also prevents self reference) */
+ if (!nr_pages) {
+ err = -EINVAL;
+ goto out;
+ }
+
bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
if (!bitmap) {
--
2.47.0
[BUG]
There are at least two problems when run_delalloc_nocow() hits some
error and has to go cleanup routine:
- It doesn't clear the folio dirty flags of any successfully ran range
This breaks the regular error handling protocol for folio writeback,
which should clear the dirty flag of the failed range.
This clean up protocol is adapted by both iomap and btrfs (if the error
happened at the very beginning of the whole delalloc range).
- It can start writeback/unlock folios which is already unlocked
This is done by calling extent_clear_unlock_delalloc() with
PAGE_START_WRITEBACK or PAGE_UNLOCK flag.
This will trigger the VM_BUG_ON() for folio_start_writeback(), which
requires the folio to be locked.
[CAUSE]
The problem of not clearing the folio dirty flag is a common bug, shared
between cow_file_range() and run_delalloc_nocow().
We just need to clear the folio dirty flags according to the @cur_offset
cursor.
For the extent_clear_unlock_delalloc() on unlocked folios, it's because
the double error handling, one from cow_file_range() (inside
fallback_to_cow()), one from run_delalloc_nocow() itself.
[FIX]
- Clear folio dirty for range [@start, @cur_offset)
Introduce a helper, cleanup_dirty_folios(), which
will find and lock the folio in the range, clear the dirty flag and
start/end the writeback, with the extra handling for the
@locked_folio.
- Introduce a helper to record the last failed COW range end
This is to trace which range we should skip, to avoid double
unlocking.
- Skip the failed COW range for the error handling
Cc: stable(a)vger.kernel.org
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
Changelog:
v3:
- Fix the double error handling on the COW range
Which can lead to VM_BUG_ON() for extent_clear_unlock_delalloc(), as
the folio is already unlocked by the error handling inside
cow_file_range().
- Update the commit message to explain the bug better
- Add a comment inside the error handling explaining the error patterns
v2:
- Fix the incorrect @cur_offset assignment to @end
The @end is not aligned to sector size, nor @cur_offset should be
updated before fallback_to_cow() succeeded.
- Add one extra ASSERT() to make sure the range is properly aligned
---
fs/btrfs/inode.c | 93 ++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 86 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9517fb2df649..069599b025a6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1969,6 +1969,48 @@ static int can_nocow_file_extent(struct btrfs_path *path,
return ret < 0 ? ret : can_nocow;
}
+static void cleanup_dirty_folios(struct btrfs_inode *inode,
+ struct folio *locked_folio,
+ u64 start, u64 end, int error)
+{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
+ struct address_space *mapping = inode->vfs_inode.i_mapping;
+ pgoff_t start_index = start >> PAGE_SHIFT;
+ pgoff_t end_index = end >> PAGE_SHIFT;
+ u32 len;
+
+ ASSERT(end + 1 - start < U32_MAX);
+ ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+ IS_ALIGNED(end + 1, fs_info->sectorsize));
+ len = end + 1 - start;
+
+ /*
+ * Handle the locked folio first.
+ * btrfs_folio_clamp_*() helpers can handle range out of the folio case.
+ */
+ btrfs_folio_clamp_clear_dirty(fs_info, locked_folio, start, len);
+ btrfs_folio_clamp_set_writeback(fs_info, locked_folio, start, len);
+ btrfs_folio_clamp_clear_writeback(fs_info, locked_folio, start, len);
+
+ for (pgoff_t index = start_index; index <= end_index; index++) {
+ struct folio *folio;
+
+ /* Already handled at the beginning. */
+ if (index == locked_folio->index)
+ continue;
+ folio = __filemap_get_folio(mapping, index, FGP_LOCK, GFP_NOFS);
+ /* Cache already dropped, no need to do any cleanup. */
+ if (IS_ERR(folio))
+ continue;
+ btrfs_folio_clamp_clear_dirty(fs_info, folio, start, len);
+ btrfs_folio_clamp_set_writeback(fs_info, folio, start, len);
+ btrfs_folio_clamp_clear_writeback(fs_info, folio, start, len);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ mapping_set_error(mapping, error);
+}
+
/*
* when nowcow writeback call back. This checks for snapshots or COW copies
* of the extents that exist in the file, and COWs the file as required.
@@ -1984,6 +2026,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
struct btrfs_root *root = inode->root;
struct btrfs_path *path;
u64 cow_start = (u64)-1;
+ /*
+ * If not 0, represents the inclusive end of the last fallback_to_cow()
+ * range. Only for error handling.
+ */
+ u64 cow_end = 0;
u64 cur_offset = start;
int ret;
bool check_prev = true;
@@ -2144,6 +2191,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
found_key.offset - 1);
cow_start = (u64)-1;
if (ret) {
+ cow_end = found_key.offset - 1;
btrfs_dec_nocow_writers(nocow_bg);
goto error;
}
@@ -2217,11 +2265,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
cow_start = cur_offset;
if (cow_start != (u64)-1) {
- cur_offset = end;
ret = fallback_to_cow(inode, locked_folio, cow_start, end);
cow_start = (u64)-1;
- if (ret)
+ if (ret) {
+ cow_end = end;
goto error;
+ }
}
btrfs_free_path(path);
@@ -2229,12 +2278,42 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
error:
/*
- * If an error happened while a COW region is outstanding, cur_offset
- * needs to be reset to cow_start to ensure the COW region is unlocked
- * as well.
+ * There are several error cases:
+ *
+ * 1) Failed without falling back to COW
+ * start cur_start end
+ * |/////////////| |
+ *
+ * For range [start, cur_start) the folios are already unlocked (except
+ * @locked_folio), EXTENT_DELALLOC already removed.
+ * Only need to clear the dirty flag as they will never be submitted.
+ * Ordered extent and extent maps are handled by
+ * btrfs_mark_ordered_io_finished() inside run_delalloc_range().
+ *
+ * 2) Failed with error from fallback_to_cow()
+ * start cur_start cow_end end
+ * |/////////////|-----------| |
+ *
+ * For range [start, cur_start) it's the same as case 1).
+ * But for range [cur_start, cow_end), the folios have dirty flag
+ * cleared and unlocked, EXTENT_DEALLLOC cleared.
+ * There may or may not be any ordered extents/extent maps allocated.
+ *
+ * We should not call extent_clear_unlock_delalloc() on range [cur_start,
+ * cow_end), as the folios are already unlocked.
+ *
+ * So clear the folio dirty flags for [start, cur_offset) first.
*/
- if (cow_start != (u64)-1)
- cur_offset = cow_start;
+ if (cur_offset > start)
+ cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
+
+ /*
+ * If an error happened while a COW region is outstanding, cur_offset
+ * needs to be reset to @cow_end + 1 to skip the COW range, as
+ * cow_file_range() will do the proper cleanup at error.
+ */
+ if (cow_end)
+ cur_offset = cow_end + 1;
/*
* We need to lock the extent here because we're clearing DELALLOC and
--
2.47.1
Just like cow_file_range(), from day 1 btrfs doesn't really clean the
dirty flags, if it has an ordered extent created successfully.
Per error handling protocol (according to the iomap, and the btrfs
handling if it failed at the beginning of the range), we should clear
all dirty flags for the involved folios.
Or the range of that folio will still be marked dirty, but has no
EXTENT_DEALLLOC set inside the io tree.
Since the folio range is still dirty, it will still be the target for
the next writeback, but since there is no EXTENT_DEALLLOC, no new
ordered extent will be created for it.
This means the writeback of that folio range will fall back to COW
fixup path. However the COW fixup path itself is being re-evaluated as
the newly introduced pin_user_pages_*() should prevent us hitting an
out-of-band dirty folios, and we're moving to deprecate such COW fixup
path.
We already have an experimental patch that will make fixup COW path to
crash, to verify there is no such out-of-band dirty folios anymore.
So here we need to avoid going COW fixup path, by doing proper folio
dirty flags cleanup.
Unlike the fix in cow_file_range(), which holds the folio and extent
lock until error or a fully successfully run, here we have no such luxury
as we can fallback to COW, and in that case the extent/folio range will
be unlocked by cow_file_range().
So here we introduce a new helper, cleanup_dirty_folios(), to clear the
dirty flags for the involved folios.
And since the final fallback_to_cow() call can also fail, and we rely on
@cur_offset to do the proper cleanup, here we remove the unnecessary and
incorrect @cur_offset assignment.
Cc: stable(a)vger.kernel.org
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
Changelog:
v2:
- Fix the incorrect @cur_offset assignment to @end
The @end is not aligned to sector size, nor @cur_offset should be
updated before fallback_to_cow() succeeded.
- Add one extra ASSERT() to make sure the range is properly aligned
---
fs/btrfs/inode.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 58 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e8232ac7917f..92df6dfff2e4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1969,6 +1969,48 @@ static int can_nocow_file_extent(struct btrfs_path *path,
return ret < 0 ? ret : can_nocow;
}
+static void cleanup_dirty_folios(struct btrfs_inode *inode,
+ struct folio *locked_folio,
+ u64 start, u64 end, int error)
+{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
+ struct address_space *mapping = inode->vfs_inode.i_mapping;
+ pgoff_t start_index = start >> PAGE_SHIFT;
+ pgoff_t end_index = end >> PAGE_SHIFT;
+ u32 len;
+
+ ASSERT(end + 1 - start < U32_MAX);
+ ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+ IS_ALIGNED(end + 1, fs_info->sectorsize));
+ len = end + 1 - start;
+
+ /*
+ * Handle the locked folio first.
+ * btrfs_folio_clamp_*() helpers can handle range out of the folio case.
+ */
+ btrfs_folio_clamp_clear_dirty(fs_info, locked_folio, start, len);
+ btrfs_folio_clamp_set_writeback(fs_info, locked_folio, start, len);
+ btrfs_folio_clamp_clear_writeback(fs_info, locked_folio, start, len);
+
+ for (pgoff_t index = start_index; index <= end_index; index++) {
+ struct folio *folio;
+
+ /* Already handled at the beginning. */
+ if (index == locked_folio->index)
+ continue;
+ folio = __filemap_get_folio(mapping, index, FGP_LOCK, GFP_NOFS);
+ /* Cache already dropped, no need to do any cleanup. */
+ if (IS_ERR(folio))
+ continue;
+ btrfs_folio_clamp_clear_dirty(fs_info, folio, start, len);
+ btrfs_folio_clamp_set_writeback(fs_info, folio, start, len);
+ btrfs_folio_clamp_clear_writeback(fs_info, folio, start, len);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ mapping_set_error(mapping, error);
+}
+
/*
* when nowcow writeback call back. This checks for snapshots or COW copies
* of the extents that exist in the file, and COWs the file as required.
@@ -2217,7 +2259,6 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
cow_start = cur_offset;
if (cow_start != (u64)-1) {
- cur_offset = end;
ret = fallback_to_cow(inode, locked_folio, cow_start, end);
cow_start = (u64)-1;
if (ret)
@@ -2228,6 +2269,22 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
return 0;
error:
+ /*
+ * We have some range with ordered extent created.
+ *
+ * Ordered extents and extent maps will be cleaned up by
+ * btrfs_mark_ordered_io_finished() later, but we also need to cleanup
+ * the dirty flags of folios.
+ *
+ * Or they can be written back again, but without any EXTENT_DELALLOC flag
+ * in io tree.
+ * This will force the writeback to go COW fixup, which is being deprecated.
+ *
+ * Also such left-over dirty flags do no follow the error handling protocol.
+ */
+ if (cur_offset > start)
+ cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
+
/*
* If an error happened while a COW region is outstanding, cur_offset
* needs to be reset to cow_start to ensure the COW region is unlocked
--
2.47.1
From: Kuan-Wei Chiu <visitorckw(a)gmail.com>
The cmp_entries_dup() function used as the comparator for sort()
violated the symmetry and transitivity properties required by the
sorting algorithm. Specifically, it returned 1 whenever memcmp() was
non-zero, which broke the following expectations:
* Symmetry: If x < y, then y > x.
* Transitivity: If x < y and y < z, then x < z.
These violations could lead to incorrect sorting and failure to
correctly identify duplicate elements.
Fix the issue by directly returning the result of memcmp(), which
adheres to the required comparison properties.
Cc: stable(a)vger.kernel.org
Fixes: 08d43a5fa063 ("tracing: Add lock-free tracing_map")
Link: https://lore.kernel.org/20241203202228.1274403-1-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/tracing_map.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index 3a56e7c8aa4f..1921ade45be3 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -845,15 +845,11 @@ int tracing_map_init(struct tracing_map *map)
static int cmp_entries_dup(const void *A, const void *B)
{
const struct tracing_map_sort_entry *a, *b;
- int ret = 0;
a = *(const struct tracing_map_sort_entry **)A;
b = *(const struct tracing_map_sort_entry **)B;
- if (memcmp(a->key, b->key, a->elt->map->key_size))
- ret = 1;
-
- return ret;
+ return memcmp(a->key, b->key, a->elt->map->key_size);
}
static int cmp_entries_sum(const void *A, const void *B)
--
2.45.2