The patch titled
Subject: nilfs2: fix failure to detect DAT corruption in btree and direct mappings
has been added to the -mm mm-nonmm-unstable branch. Its filename is
nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix failure to detect DAT corruption in btree and direct mappings
Date: Wed, 13 Mar 2024 19:58:26 +0900
Patch series "nilfs2: fix kernel bug at submit_bh_wbc()".
This resolves a kernel BUG reported by syzbot. Since there are two
flaws involved, I've made each one a separate patch.
The first patch alone resolves the syzbot-reported bug, but I think
both fixes should be sent to stable, so I've tagged them as such.
This patch (of 2):
Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data
to a nilfs2 file system whose metadata is corrupted.
There are two flaws involved in this issue.
The first flaw is that when nilfs_get_block() locates a data block using
btree or direct mapping, if the disk address translation routine
nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata
corruption, it can be passed back to nilfs_get_block(). This causes
nilfs_get_block() to misidentify an existing block as non-existent,
causing both data block lookup and insertion to fail inconsistently.
The second flaw is that nilfs_get_block() returns a successful status in
this inconsistent state. This causes the caller __block_write_begin_int()
or others to request a read even though the buffer is not mapped,
resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc()
failing.
This fixes the first issue by changing the return value to code -EINVAL
when a conversion using DAT fails with code -ENOENT, avoiding the
conflicting condition that leads to the kernel bug described above. Here,
code -EINVAL indicates that metadata corruption was detected during the
block lookup, which will be properly handled as a file system error and
converted to -EIO when passing through the nilfs2 bmap layer.
Link: https://lkml.kernel.org/r/20240313105827.5296-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240313105827.5296-2-konishi.ryusuke@gmail.com
Fixes: c3a7abf06ce7 ("nilfs2: support contiguous lookup of blocks")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+cfed5b56649bddf80d6e(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/btree.c | 9 +++++++--
fs/nilfs2/direct.c | 9 +++++++--
2 files changed, 14 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/btree.c~nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings
+++ a/fs/nilfs2/btree.c
@@ -724,7 +724,7 @@ static int nilfs_btree_lookup_contig(con
dat = nilfs_bmap_get_dat(btree);
ret = nilfs_dat_translate(dat, ptr, &blocknr);
if (ret < 0)
- goto out;
+ goto dat_error;
ptr = blocknr;
}
cnt = 1;
@@ -743,7 +743,7 @@ static int nilfs_btree_lookup_contig(con
if (dat) {
ret = nilfs_dat_translate(dat, ptr2, &blocknr);
if (ret < 0)
- goto out;
+ goto dat_error;
ptr2 = blocknr;
}
if (ptr2 != ptr + cnt || ++cnt == maxblocks)
@@ -781,6 +781,11 @@ static int nilfs_btree_lookup_contig(con
out:
nilfs_btree_free_path(path);
return ret;
+
+ dat_error:
+ if (ret == -ENOENT)
+ ret = -EINVAL; /* Notify bmap layer of metadata corruption */
+ goto out;
}
static void nilfs_btree_promote_key(struct nilfs_bmap *btree,
--- a/fs/nilfs2/direct.c~nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings
+++ a/fs/nilfs2/direct.c
@@ -66,7 +66,7 @@ static int nilfs_direct_lookup_contig(co
dat = nilfs_bmap_get_dat(direct);
ret = nilfs_dat_translate(dat, ptr, &blocknr);
if (ret < 0)
- return ret;
+ goto dat_error;
ptr = blocknr;
}
@@ -79,7 +79,7 @@ static int nilfs_direct_lookup_contig(co
if (dat) {
ret = nilfs_dat_translate(dat, ptr2, &blocknr);
if (ret < 0)
- return ret;
+ goto dat_error;
ptr2 = blocknr;
}
if (ptr2 != ptr + cnt)
@@ -87,6 +87,11 @@ static int nilfs_direct_lookup_contig(co
}
*ptrp = ptr;
return cnt;
+
+ dat_error:
+ if (ret == -ENOENT)
+ ret = -EINVAL; /* Notify bmap layer of metadata corruption */
+ return ret;
}
static __u64
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings.patch
nilfs2-prevent-kernel-bug-at-submit_bh_wbc.patch
Boris managed to create a device capable of changing its maj:min without
altering its device path.
Only multi-devices can be scanned. A device that gets scanned and remains
in the Btrfs kernel cache might end up with an incorrect maj:min.
Despite the tempfsid feature patch did not introduce this bug, it could
lead to issues if the above multi-device is converted to a single device
with a stale maj:min. Subsequently, attempting to mount the same device
with the correct maj:min might mistake it for another device with the same
fsid, potentially resulting in wrongly auto-enabling the tempfsid feature.
To address this, this patch validates the device's maj:min at the time of
device open and updates it if it has changed since the last scan.
CC: stable(a)vger.kernel.org # 6.7+
Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability")
Reported-by: Boris Burkov <boris(a)bur.io>
Co-developed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
---
v2:
Drop using lookup_bdev() instead, get it from device->bdev->bd_dev.
v1:
https://lore.kernel.org/linux-btrfs/752b8526be21d984e0ee58c7f66d312664ff5ac…
fs/btrfs/volumes.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e49935a54da0..c318640b4472 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -692,6 +692,16 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
device->bdev = bdev_handle->bdev;
clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
+ if (device->devt != device->bdev->bd_dev) {
+ btrfs_warn(NULL,
+ "device %s maj:min changed from %d:%d to %d:%d",
+ device->name->str, MAJOR(device->devt),
+ MINOR(device->devt), MAJOR(device->bdev->bd_dev),
+ MINOR(device->bdev->bd_dev));
+
+ device->devt = device->bdev->bd_dev;
+ }
+
fs_devices->open_devices++;
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
device->devid != BTRFS_DEV_REPLACE_DEVID) {
--
2.38.1
The current code in pcie_wait_for_link_delay() handles the value
returned by pcie_failed_link_retrain() as an integer, expecting 0
when the link has been successfully retrained. The issue is that
pcie_failed_link_retrain() returns a boolean: "true" if the link
has been successfully retrained and "false" otherwise. This leads
pcie_wait_for_link_delay() to return an incorrect "active link"
status when pcie_failed_link_retrain() is called.
This patch fixes the check of the value returned by
pcie_failed_link_retrain() in pcie_wait_for_link_delay().
Note that this bug induces abnormal timeout delays when a PCI device
is unplugged (around 60 seconds per bridge / secondary bus removed).
Cc: stable(a)vger.kernel.org
Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Signed-off-by: Simon Guinot <simon.guinot(a)seagate.com>
---
drivers/pci/pci.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ccee56615f78..7ec91b4c5d03 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5101,9 +5101,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
msleep(20);
rc = pcie_wait_for_link_status(pdev, false, active);
if (active) {
- if (rc)
- rc = pcie_failed_link_retrain(pdev);
- if (rc)
+ if (rc && !pcie_failed_link_retrain(pdev))
return false;
msleep(delay);
--
2.43.0
arch/powerpc/lib/xor_vmx.o is built with '-msoft-float' (from the main
powerpc Makefile) and '-maltivec' (from its CFLAGS), which causes an
error when building with clang after a recent change in main:
error: option '-msoft-float' cannot be specified with '-maltivec'
make[6]: *** [scripts/Makefile.build:243: arch/powerpc/lib/xor_vmx.o] Error 1
Explicitly add '-mhard-float' before '-maltivec' in xor_vmx.o's CFLAGS
to override the previous inclusion of '-msoft-float' (as the last option
wins), which matches how other areas of the kernel use '-maltivec', such
as AMDGPU.
Cc: stable(a)vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1986
Link: https://github.com/llvm/llvm-project/commit/4792f912b232141ecba4cbae538873b…
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
arch/powerpc/lib/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 6eac63e79a89..0ab65eeb93ee 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -76,7 +76,7 @@ obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o
-CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
+CFLAGS_xor_vmx.o += -mhard-float -maltivec $(call cc-option,-mabi=altivec)
# Enable <altivec.h>
CFLAGS_xor_vmx.o += -isystem $(shell $(CC) -print-file-name=include)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240127-ppc-xor_vmx-drop-msoft-float-ad68b437f86c
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
Without the device table the driver will not auto-load when compiled as
a module.
Fixes: 273db8f03509 ("Input: add IOC3 serio driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Karel Balej <balejk(a)matfyz.cz>
---
I do not own any device using this driver, but I verified that modinfo
does not show the "platform:ioc3-kbd" alias when run on the module
compiled without this patch and it does with it.
drivers/input/serio/ioc3kbd.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/input/serio/ioc3kbd.c b/drivers/input/serio/ioc3kbd.c
index 50552dc7b4f5..676b0bda3d72 100644
--- a/drivers/input/serio/ioc3kbd.c
+++ b/drivers/input/serio/ioc3kbd.c
@@ -200,9 +200,16 @@ static void ioc3kbd_remove(struct platform_device *pdev)
serio_unregister_port(d->aux);
}
+static const struct platform_device_id ioc3kbd_id_table[] = {
+ { "ioc3-kbd", },
+ { }
+};
+MODULE_DEVICE_TABLE(platform, ioc3kbd_id_table);
+
static struct platform_driver ioc3kbd_driver = {
.probe = ioc3kbd_probe,
.remove_new = ioc3kbd_remove,
+ .id_table = ioc3kbd_id_table,
.driver = {
.name = "ioc3-kbd",
},
--
2.44.0
From: Zi Yan <ziy(a)nvidia.com>
The tail pages in a THP can have swap entry information stored in their
private field. When migrating to a new page, all tail pages of the new
page need to update ->private to avoid future data corruption.
This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory:
work on folio->swap instead of page->private when splitting folio"),
subpages of a swapcached THP no longer requires the maintenance.
Adding THPs to the swapcache was introduced in commit
38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out"),
where each subpage of a THP added to the swapcache had its own swapcache
entry and required the ->private field to point to the correct swapcache
entry. Later, when THP migration functionality was implemented in commit
616b8371539a6 ("mm: thp: enable thp migration in generic path"),
it initially did not handle the subpages of swapcached THPs, failing to
update their ->private fields or replace the subpage pointers in the
swapcache. Subsequently, commit e71769ae5260 ("mm: enable thp migration
for shmem thp") addressed the swapcache update aspect. This patch fixes
the update of subpage ->private fields.
Closes: https://lore.kernel.org/linux-mm/1707814102-22682-1-git-send-email-quic_cha…
Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
---
mm/migrate.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index c93dd6a31c31..c5968021fde0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -423,8 +423,12 @@ int folio_migrate_mapping(struct address_space *mapping,
if (folio_test_swapbacked(folio)) {
__folio_set_swapbacked(newfolio);
if (folio_test_swapcache(folio)) {
+ int i;
+
folio_set_swapcache(newfolio);
- newfolio->private = folio_get_private(folio);
+ for (i = 0; i < nr; i++)
+ set_page_private(folio_page(newfolio, i),
+ page_private(folio_page(folio, i)));
}
entries = nr;
} else {
--
2.43.0
In f2fs_do_write_data_page, FI_ATOMIC_FILE flag selects the target inode
between the original inode and COW inode. When aborting atomic write and
writeback occur simultaneously, invalid data can be written to original
inode if the FI_ATOMIC_FILE flag is cleared meanwhile.
To prevent the problem, let's truncate all pages before clearing the flag
Atomic write thread Writeback thread
f2fs_abort_atomic_write
clear_inode_flag(inode, FI_ATOMIC_FILE)
__writeback_single_inode
do_writepages
f2fs_do_write_data_page
- use dn of original inode
truncate_inode_pages_final
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
---
fs/f2fs/segment.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 7901ede58113..7e47b8054413 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -192,6 +192,9 @@ void f2fs_abort_atomic_write(struct inode *inode, bool clean)
if (!f2fs_is_atomic_file(inode))
return;
+ if (clean)
+ truncate_inode_pages_final(inode->i_mapping);
+
release_atomic_write_cnt(inode);
clear_inode_flag(inode, FI_ATOMIC_COMMITTED);
clear_inode_flag(inode, FI_ATOMIC_REPLACE);
@@ -201,7 +204,6 @@ void f2fs_abort_atomic_write(struct inode *inode, bool clean)
F2FS_I(inode)->atomic_write_task = NULL;
if (clean) {
- truncate_inode_pages_final(inode->i_mapping);
f2fs_i_size_write(inode, fi->original_i_size);
fi->original_i_size = 0;
}
--
2.25.1
In f2fs_update_inode, i_size of the atomic file isn't updated until
FI_ATOMIC_COMMITTED flag is set. When committing atomic write right
after the writeback of the inode, i_size of the raw inode will not be
updated. It can cause the atomicity corruption due to a mismatch between
old file size and new data.
To prevent the problem, let's mark inode dirty for FI_ATOMIC_COMMITTED
Atomic write thread Writeback thread
__writeback_single_inode
write_inode
f2fs_update_inode
- skip i_size update
f2fs_ioc_commit_atomic_write
f2fs_commit_atomic_write
set_inode_flag(inode, FI_ATOMIC_COMMITTED)
f2fs_do_sync_file
f2fs_fsync_node_pages
- skip f2fs_update_inode since the inode is clean
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
---
fs/f2fs/f2fs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 543898482f8b..a000cb024dbe 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3039,6 +3039,7 @@ static inline void __mark_inode_dirty_flag(struct inode *inode,
case FI_INLINE_DOTS:
case FI_PIN_FILE:
case FI_COMPRESS_RELEASED:
+ case FI_ATOMIC_COMMITTED:
f2fs_mark_inode_dirty_sync(inode, true);
}
}
--
2.25.1