Recently we found a bug related with ext4 buffer head is fixed by
commit 0b73284c564d("ext4: ext4_read_bh_lock() should submit IO if the
buffer isn't uptodate")[1].
This bug is fixed on some kernel long term versions, such as 5.10 and 5.15.
However, on 5.4 stable version, we can still easily reproduce this bug by
adding some delay after buffer_migrate_lock_buffers() in __buffer_migrate_page()
and do fsstress on the ext4 filesystem. We can get some errors in dmesg like:
EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #73193:
comm fsstress: reading directory lblock 0
EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #75334:
comm fsstress: reading directory lblock 0
About how to fix this bug in 5.4 version, currently I have three ideas.
But I don't know which one is better or is there any other feasible way to
fix this bug elegantly based on the 5.4 stable branch?
The first idea comes from this thread[2]. In __buffer_migrate_page(),
we can let it fallback to migrate_page that are not uptodate like
fallback_migrate_page(), those pages that has buffers may probably do
read operation soon. From [3], we can see this solution is not good enough
because there are other places that lock the buffer without doing IO.
I think this solution can be a candidate option to fix if we do not want to
change a lot. Also based on my test results, the ext4 filesystem remains
stable after one week stress test with this patch applied.
The second idea is backport a series of commits from upstream, such as
2d069c0889ef ("ext4: use common helpers in all places reading metadata buffers")
0b73284c564d ("ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate")
79f597842069 ("fs/buffer: remove ll_rw_block() helper")
This will lead to many lines of code change and should be carefully conducted,
but it looks like the most reasonable solution so far.
The third idea is replace trylock_buffer in ll_rw_block() with lock_buffer and
change ll_rw_block() in __breadahead_gfp() to trylock_buffer. However,
this will change the semantic of ll_rw_block(), and will not be suitable for
some readahead circumstances. Besides, the ll_rw_block() has many occurences
among many filesystems other than ext4, I think it is better to limit the
fix in the ext4 filesystem without affecting other filesystems.
Here I send the patch based on the first idea, hope someone can give more ideas
about how to fix this bug in kernel 5.4 version, thanks.
[1] https://lore.kernel.org/linux-mm/20220825080146.2021641-1-chengzhihao1@huaw…
[2] https://lore.kernel.org/all/20220831074629.3755110-1-yi.zhang@huawei.com/T/
[3] https://lore.kernel.org/linux-mm/20220825105704.e46hz6dp6opawsjk@quack3/
Yue Zhao (1):
mm: migrate: buffer_migrate_page_norefs() fallback migrate not
uptodate pages
mm/migrate.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
--
2.17.1
This triggers a -Wdeclaration-after-statement as the code has changed a
bit since upstream. It might be better to hoist the whole block up, but
this is a smaller change so I went with it.
arch/riscv/mm/init.c:755:16: warning: mixing declarations and code is a C99 extension [-Wdeclaration-after-statement]
unsigned long idx = pgd_index(__fix_to_virt(FIX_FDT));
^
1 warning generated.
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304300429.SXZOA5up-lkp@intel.com/
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
I haven't even build tested this one, but it looks simple enough that I figured
I'd just send it. Be warned, though: I broke glibc and missed a merged
conflict yesterday...
---
arch/riscv/mm/init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index e800d7981e99..8d67f43f1865 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -717,6 +717,7 @@ static void __init setup_vm_final(void)
uintptr_t va, map_size;
phys_addr_t pa, start, end;
u64 i;
+ unsigned long idx;
/**
* MMU is enabled at this point. But page table setup is not complete yet.
@@ -735,7 +736,7 @@ static void __init setup_vm_final(void)
* directly in swapper_pg_dir in addition to the pgd entry that points
* to fixmap_pte.
*/
- unsigned long idx = pgd_index(__fix_to_virt(FIX_FDT));
+ idx = pgd_index(__fix_to_virt(FIX_FDT));
set_pgd(&swapper_pg_dir[idx], early_pg_dir[idx]);
#endif
--
2.40.0
Hi,
Some Pink Sardine platforms have some stability problems with reboot
cycling and it has been root caused to a misconfigured mux for audio.
It's been fixed in this commit:
a4d432e9132c ("ASoC: amd: ps: update the acp clock source.")
Can you please backport this to 6.1.y +
Thanks,
Hi,
A number of laptops with Mediatek wifi the wifi doesn't work unless you
turn off fast boot in BIOS setup.
These laptops all ship with fast boot as the default.
It's been fixed by this commit:
09d4d6da1b65 ("wifi: mt76: mt7921e: Set memory space enable in
PCI_COMMAND if unset")
Can you please bring it to 5.15.y and later?
Thanks,
The driver have a race, experienced only with PREEMPT_RT patchset:
CPU0 | CPU1
==================================================================
qcom_geni_serial_probe |
uart_add_one_port |
| serdev_drv_probe
| qca_serdev_probe
| serdev_device_open
| uart_open
| uart_startup
| qcom_geni_serial_startup
| enable_irq
| __irq_startup
| WARN_ON()
| IRQ not activated
request_threaded_irq |
irq_domain_activate_irq |
The warning:
894000.serial: ttyHS1 at MMIO 0x894000 (irq = 144, base_baud = 0) is a MSM
serial serial0: tty port ttyHS1 registered
WARNING: CPU: 7 PID: 107 at kernel/irq/chip.c:241 __irq_startup+0x78/0xd8
...
qcom_geni_serial 894000.serial: serial engine reports 0 RX bytes in!
Adding UART port triggers probe of child serial devices - serdev and
eventually Qualcomm Bluetooth hci_qca driver. This opens UART port
which enables the interrupt before it got activated in
request_threaded_irq(). The issue originates in commit f3974413cf02
("tty: serial: qcom_geni_serial: Wakeup IRQ cleanup") and discussion on
mailing list [1]. However the above commit does not explain why the
uart_add_one_port() is moved above requesting interrupt.
[1] https://lore.kernel.org/all/5d9f3dfa.1c69fb81.84c4b.30bf@mx.google.com/
Fixes: f3974413cf02 ("tty: serial: qcom_geni_serial: Wakeup IRQ cleanup")
Cc: <stable(a)vger.kernel.org>
Cc: Stephen Boyd <swboyd(a)chromium.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
drivers/tty/serial/qcom_geni_serial.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 08dc3e2a729c..8582479f0211 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -1664,19 +1664,18 @@ static int qcom_geni_serial_probe(struct platform_device *pdev)
uport->private_data = &port->private_data;
platform_set_drvdata(pdev, port);
- ret = uart_add_one_port(drv, uport);
- if (ret)
- return ret;
-
irq_set_status_flags(uport->irq, IRQ_NOAUTOEN);
ret = devm_request_irq(uport->dev, uport->irq, qcom_geni_serial_isr,
IRQF_TRIGGER_HIGH, port->name, uport);
if (ret) {
dev_err(uport->dev, "Failed to get IRQ ret %d\n", ret);
- uart_remove_one_port(drv, uport);
return ret;
}
+ ret = uart_add_one_port(drv, uport);
+ if (ret)
+ return ret;
+
/*
* Set pm_runtime status as ACTIVE so that wakeup_irq gets
* enabled/disabled from dev_pm_arm_wake_irq during system
--
2.34.1
In binder_transaction_buffer_release() the 'failed_at' offset indicates
the number of objects to clean up. However, this function was changed by
commit 44d8047f1d87 ("binder: use standard functions to allocate fds"),
to release all the objects in the buffer when 'failed_at' is zero.
This introduced an issue when a transaction buffer is released without
any objects having been processed so far. In this case, 'failed_at' is
indeed zero yet it is misinterpreted as releasing the entire buffer.
This leads to use-after-free errors where nodes are incorrectly freed
and subsequently accessed. Such is the case in the following KASAN
report:
==================================================================
BUG: KASAN: slab-use-after-free in binder_thread_read+0xc40/0x1f30
Read of size 8 at addr ffff4faf037cfc58 by task poc/474
CPU: 6 PID: 474 Comm: poc Not tainted 6.3.0-12570-g7df047b3f0aa #5
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x94/0xec
show_stack+0x18/0x24
dump_stack_lvl+0x48/0x60
print_report+0xf8/0x5b8
kasan_report+0xb8/0xfc
__asan_load8+0x9c/0xb8
binder_thread_read+0xc40/0x1f30
binder_ioctl+0xd9c/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
Allocated by task 474:
kasan_save_stack+0x3c/0x64
kasan_set_track+0x2c/0x40
kasan_save_alloc_info+0x24/0x34
__kasan_kmalloc+0xb8/0xbc
kmalloc_trace+0x48/0x5c
binder_new_node+0x3c/0x3a4
binder_transaction+0x2b58/0x36f0
binder_thread_write+0x8e0/0x1b78
binder_ioctl+0x14a0/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
Freed by task 475:
kasan_save_stack+0x3c/0x64
kasan_set_track+0x2c/0x40
kasan_save_free_info+0x38/0x5c
__kasan_slab_free+0xe8/0x154
__kmem_cache_free+0x128/0x2bc
kfree+0x58/0x70
binder_dec_node_tmpref+0x178/0x1fc
binder_transaction_buffer_release+0x430/0x628
binder_transaction+0x1954/0x36f0
binder_thread_write+0x8e0/0x1b78
binder_ioctl+0x14a0/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
==================================================================
In order to avoid these issues, let's always calculate the intended
'failed_at' offset beforehand. This is wrapped in a helper function to
make it clear and convenient.
Fixes: 32e9f56a96d8 ("binder: don't detect sender/target during buffer cleanup")
Reported-by: Zi Fan Tan <zifantan(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
---
drivers/android/binder.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index fb56bfc45096..6678a862ea84 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1938,7 +1938,7 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
bool is_failure)
{
int debug_id = buffer->debug_id;
- binder_size_t off_start_offset, buffer_offset, off_end_offset;
+ binder_size_t off_start_offset, buffer_offset;
binder_debug(BINDER_DEBUG_TRANSACTION,
"%d buffer release %d, size %zd-%zd, failed at %llx\n",
@@ -1950,9 +1950,8 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
binder_dec_node(buffer->target_node, 1, 0);
off_start_offset = ALIGN(buffer->data_size, sizeof(void *));
- off_end_offset = is_failure && failed_at ? failed_at :
- off_start_offset + buffer->offsets_size;
- for (buffer_offset = off_start_offset; buffer_offset < off_end_offset;
+
+ for (buffer_offset = off_start_offset; buffer_offset < failed_at;
buffer_offset += sizeof(binder_size_t)) {
struct binder_object_header *hdr;
size_t object_size = 0;
@@ -2111,6 +2110,25 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
}
}
+/* Clean up all the objects in the buffer */
+static inline void binder_release_entire_buffer(struct binder_proc *proc,
+ struct binder_thread *thread,
+ struct binder_buffer *buffer,
+ bool is_failure)
+{
+ binder_size_t off_end_offset;
+
+ off_end_offset = ALIGN(buffer->data_size, sizeof(void *));
+ off_end_offset += buffer->offsets_size;
+
+ /* We always pass the end of the buffer here to make sure that
+ * binder_transaction_buffer_release() loops through all the
+ * objects in the buffer.
+ */
+ binder_transaction_buffer_release(proc, thread, buffer,
+ off_end_offset, is_failure);
+}
+
static int binder_translate_binder(struct flat_binder_object *fp,
struct binder_transaction *t,
struct binder_thread *thread)
@@ -2806,7 +2824,7 @@ static int binder_proc_transaction(struct binder_transaction *t,
t_outdated->buffer = NULL;
buffer->transaction = NULL;
trace_binder_transaction_update_buffer_release(buffer);
- binder_transaction_buffer_release(proc, NULL, buffer, 0, 0);
+ binder_release_entire_buffer(proc, NULL, buffer, false);
binder_alloc_free_buf(&proc->alloc, buffer);
kfree(t_outdated);
binder_stats_deleted(BINDER_STAT_TRANSACTION);
@@ -3775,7 +3793,7 @@ binder_free_buf(struct binder_proc *proc,
binder_node_inner_unlock(buf_node);
}
trace_binder_transaction_buffer_release(buffer);
- binder_transaction_buffer_release(proc, thread, buffer, 0, is_failure);
+ binder_release_entire_buffer(proc, thread, buffer, is_failure);
binder_alloc_free_buf(&proc->alloc, buffer);
}
--
2.40.1.521.gf1e218fcd8-goog
The quilt patch titled
Subject: nilfs2: do not write dirty data after degenerating to read-only
has been removed from the -mm tree. Its filename was
nilfs2-do-not-write-dirty-data-after-degenerating-to-read-only.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: do not write dirty data after degenerating to read-only
Date: Thu, 27 Apr 2023 10:15:26 +0900
According to syzbot's report, mark_buffer_dirty() called from
nilfs_segctor_do_construct() outputs a warning with some patterns after
nilfs2 detects metadata corruption and degrades to read-only mode.
After such read-only degeneration, page cache data may be cleared through
nilfs_clear_dirty_page() which may also clear the uptodate flag for their
buffer heads. However, even after the degeneration, log writes are still
performed by unmount processing etc., which causes mark_buffer_dirty() to
be called for buffer heads without the "uptodate" flag and causes the
warning.
Since any writes should not be done to a read-only file system in the
first place, this fixes the warning in mark_buffer_dirty() by letting
nilfs_segctor_do_construct() abort early if in read-only mode.
This also changes the retry check of nilfs_segctor_write_out() to avoid
unnecessary log write retries if it detects -EROFS that
nilfs_segctor_do_construct() returned.
Link: https://lkml.kernel.org/r/20230427011526.13457-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+2af3bc9585be7f23f290(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=2af3bc9585be7f23f290
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/segment.c~nilfs2-do-not-write-dirty-data-after-degenerating-to-read-only
+++ a/fs/nilfs2/segment.c
@@ -2041,6 +2041,9 @@ static int nilfs_segctor_do_construct(st
struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
int err;
+ if (sb_rdonly(sci->sc_super))
+ return -EROFS;
+
nilfs_sc_cstage_set(sci, NILFS_ST_INIT);
sci->sc_cno = nilfs->ns_cno;
@@ -2724,7 +2727,7 @@ static void nilfs_segctor_write_out(stru
flush_work(&sci->sc_iput_work);
- } while (ret && retrycount-- > 0);
+ } while (ret && ret != -EROFS && retrycount-- > 0);
}
/**
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: mm: do not reclaim private data from pinned page
has been removed from the -mm tree. Its filename was
mm-do-not-reclaim-private-data-from-pinned-page.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: mm: do not reclaim private data from pinned page
Date: Fri, 28 Apr 2023 14:41:40 +0200
If the page is pinned, there's no point in trying to reclaim it.
Furthermore if the page is from the page cache we don't want to reclaim
fs-private data from the page because the pinning process may be writing
to the page at any time and reclaiming fs private info on a dirty page can
upset the filesystem (see link below).
Link: https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz
Link: https://lkml.kernel.org/r/20230428124140.30166-1-jack@suse.cz
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reviewed-by: Lorenzo Stoakes <lstoakes(a)gmail.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/mm/vmscan.c~mm-do-not-reclaim-private-data-from-pinned-page
+++ a/mm/vmscan.c
@@ -1967,6 +1967,16 @@ retry:
}
}
+ /*
+ * Folio is unmapped now so it cannot be newly pinned anymore.
+ * No point in trying to reclaim folio if it is pinned.
+ * Furthermore we don't want to reclaim underlying fs metadata
+ * if the folio is pinned and thus potentially modified by the
+ * pinning process as that may upset the filesystem.
+ */
+ if (folio_maybe_dma_pinned(folio))
+ goto activate_locked;
+
mapping = folio_mapping(folio);
if (folio_test_dirty(folio)) {
/*
_
Patches currently in -mm which might be from jack(a)suse.cz are
The quilt patch titled
Subject: nilfs2: fix infinite loop in nilfs_mdt_get_block()
has been removed from the -mm tree. Its filename was
nilfs2-fix-infinite-loop-in-nilfs_mdt_get_block.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix infinite loop in nilfs_mdt_get_block()
Date: Mon, 1 May 2023 04:30:46 +0900
If the disk image that nilfs2 mounts is corrupted and a virtual block
address obtained by block lookup for a metadata file is invalid,
nilfs_bmap_lookup_at_level() may return the same internal return code as
-ENOENT, meaning the block does not exist in the metadata file.
This duplication of return codes confuses nilfs_mdt_get_block(), causing
it to read and create a metadata block indefinitely.
In particular, if this happens to the inode metadata file, ifile,
semaphore i_rwsem can be left held, causing task hangs in lock_mount.
Fix this issue by making nilfs_bmap_lookup_at_level() treat virtual block
address translation failures with -ENOENT as metadata corruption instead
of returning the error code.
Link: https://lkml.kernel.org/r/20230430193046.6769-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+221d75710bde87fa0e97(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=221d75710bde87fa0e97
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/bmap.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/bmap.c~nilfs2-fix-infinite-loop-in-nilfs_mdt_get_block
+++ a/fs/nilfs2/bmap.c
@@ -67,20 +67,28 @@ int nilfs_bmap_lookup_at_level(struct ni
down_read(&bmap->b_sem);
ret = bmap->b_ops->bop_lookup(bmap, key, level, ptrp);
- if (ret < 0) {
- ret = nilfs_bmap_convert_error(bmap, __func__, ret);
+ if (ret < 0)
goto out;
- }
+
if (NILFS_BMAP_USE_VBN(bmap)) {
ret = nilfs_dat_translate(nilfs_bmap_get_dat(bmap), *ptrp,
&blocknr);
if (!ret)
*ptrp = blocknr;
+ else if (ret == -ENOENT) {
+ /*
+ * If there was no valid entry in DAT for the block
+ * address obtained by b_ops->bop_lookup, then pass
+ * internal code -EINVAL to nilfs_bmap_convert_error
+ * to treat it as metadata corruption.
+ */
+ ret = -EINVAL;
+ }
}
out:
up_read(&bmap->b_sem);
- return ret;
+ return nilfs_bmap_convert_error(bmap, __func__, ret);
}
int nilfs_bmap_lookup_contig(struct nilfs_bmap *bmap, __u64 key, __u64 *ptrp,
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are