From: Filipe Manana <fdmanana(a)suse.com>
commit 939b656bc8ab203fdbde26ccac22bcb7f0985be5 upstream.
During an append (O_APPEND write flag) direct IO write if the input buffer
was not previously faulted in, we can corrupt the file in a way that the
final size is unexpected and it includes an unexpected hole.
The problem happens like this:
1) We have an empty file, with size 0, for example;
2) We do an O_APPEND direct IO with a length of 4096 bytes and the input
buffer is not currently faulted in;
3) We enter btrfs_direct_write(), lock the inode and call
generic_write_checks(), which calls generic_write_checks_count(), and
that function sets the iocb position to 0 with the following code:
if (iocb->ki_flags & IOCB_APPEND)
iocb->ki_pos = i_size_read(inode);
4) We call btrfs_dio_write() and enter into iomap, which will end up
calling btrfs_dio_iomap_begin() and that calls
btrfs_get_blocks_direct_write(), where we update the i_size of the
inode to 4096 bytes;
5) After btrfs_dio_iomap_begin() returns, iomap will attempt to access
the page of the write input buffer (at iomap_dio_bio_iter(), with a
call to bio_iov_iter_get_pages()) and fail with -EFAULT, which gets
returned to btrfs at btrfs_direct_write() via btrfs_dio_write();
6) At btrfs_direct_write() we get the -EFAULT error, unlock the inode,
fault in the write buffer and then goto to the label 'relock';
7) We lock again the inode, do all the necessary checks again and call
again generic_write_checks(), which calls generic_write_checks_count()
again, and there we set the iocb's position to 4K, which is the current
i_size of the inode, with the following code pointed above:
if (iocb->ki_flags & IOCB_APPEND)
iocb->ki_pos = i_size_read(inode);
8) Then we go again to btrfs_dio_write() and enter iomap and the write
succeeds, but it wrote to the file range [4K, 8K[, leaving a hole in
the [0, 4K[ range and an i_size of 8K, which goes against the
expections of having the data written to the range [0, 4K[ and get an
i_size of 4K.
Fix this by not unlocking the inode before faulting in the input buffer,
in case we get -EFAULT or an incomplete write, and not jumping to the
'relock' label after faulting in the buffer - instead jump to a location
immediately before calling iomap, skipping all the write checks and
relocking. This solves this problem and it's fine even in case the input
buffer is memory mapped to the same file range, since only holding the
range locked in the inode's io tree can cause a deadlock, it's safe to
keep the inode lock (VFS lock), as was fixed and described in commit
51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO
reads and writes").
A sample reproducer provided by a reporter is the following:
$ cat test.c
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
if (argc < 2) {
fprintf(stderr, "Usage: %s <test file>\n", argv[0]);
return 1;
}
int fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT |
O_APPEND, 0644);
if (fd < 0) {
perror("creating test file");
return 1;
}
char *buf = mmap(NULL, 4096, PROT_READ,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
ssize_t ret = write(fd, buf, 4096);
if (ret < 0) {
perror("pwritev2");
return 1;
}
struct stat stbuf;
ret = fstat(fd, &stbuf);
if (ret < 0) {
perror("stat");
return 1;
}
printf("size: %llu\n", (unsigned long long)stbuf.st_size);
return stbuf.st_size == 4096 ? 0 : 1;
}
A test case for fstests will be sent soon.
Reported-by: Hanna Czenczek <hreitz(a)redhat.com>
Link: https://lore.kernel.org/linux-btrfs/0b841d46-12fe-4e64-9abb-871d8d0de271@re…
Fixes: 8184620ae212 ("btrfs: fix lost file sync on direct IO write with nowait and dsync iocb")
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
---
fs/btrfs/ctree.h | 1 +
fs/btrfs/file.c | 55 ++++++++++++++++++++++++++++++++++++------------
2 files changed, 43 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 06333a74d6c4..86c7f8ce1715 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -445,6 +445,7 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
+ bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c997b790568f..a47a7bbf9b7e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1535,21 +1535,37 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* So here we disable page faults in the iov_iter and then retry if we
* got -EFAULT, faulting in the pages before the retry.
*/
+again:
from->nofault = true;
dio = btrfs_dio_write(iocb, from, written);
from->nofault = false;
- /*
- * iomap_dio_complete() will call btrfs_sync_file() if we have a dsync
- * iocb, and that needs to lock the inode. So unlock it before calling
- * iomap_dio_complete() to avoid a deadlock.
- */
- btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
-
- if (IS_ERR_OR_NULL(dio))
+ if (IS_ERR_OR_NULL(dio)) {
err = PTR_ERR_OR_ZERO(dio);
- else
+ } else {
+ struct btrfs_file_private stack_private = { 0 };
+ struct btrfs_file_private *private;
+ const bool have_private = (file->private_data != NULL);
+
+ if (!have_private)
+ file->private_data = &stack_private;
+
+ /*
+ * If we have a synchoronous write, we must make sure the fsync
+ * triggered by the iomap_dio_complete() call below doesn't
+ * deadlock on the inode lock - we are already holding it and we
+ * can't call it after unlocking because we may need to complete
+ * partial writes due to the input buffer (or parts of it) not
+ * being already faulted in.
+ */
+ private = file->private_data;
+ private->fsync_skip_inode_lock = true;
err = iomap_dio_complete(dio);
+ private->fsync_skip_inode_lock = false;
+
+ if (!have_private)
+ file->private_data = NULL;
+ }
/* No increment (+=) because iomap returns a cumulative value. */
if (err > 0)
@@ -1576,10 +1592,12 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
} else {
fault_in_iov_iter_readable(from, left);
prev_left = left;
- goto relock;
+ goto again;
}
}
+ btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
+
/*
* If 'err' is -ENOTBLK or we have not written all data, then it means
* we must fallback to buffered IO.
@@ -1778,6 +1796,7 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
+ struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_inode(dentry);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -1787,6 +1806,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
+ const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
trace_btrfs_sync_file(file, datasync);
@@ -1814,7 +1834,10 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
if (ret)
goto out;
- btrfs_inode_lock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
+ if (skip_ilock)
+ down_write(&BTRFS_I(inode)->i_mmap_lock);
+ else
+ btrfs_inode_lock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
atomic_inc(&root->log_batch);
@@ -1838,7 +1861,10 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
*/
ret = start_ordered_ops(inode, start, end);
if (ret) {
- btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
+ if (skip_ilock)
+ up_write(&BTRFS_I(inode)->i_mmap_lock);
+ else
+ btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
goto out;
}
@@ -1941,7 +1967,10 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
* file again, but that will end up using the synchronization
* inside btrfs_sync_log to keep things safe.
*/
- btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
+ if (skip_ilock)
+ up_write(&BTRFS_I(inode)->i_mmap_lock);
+ else
+ btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
if (ret == BTRFS_NO_LOG_SYNC) {
ret = btrfs_end_transaction(trans);
--
2.43.0
This set of patches adds the clocks necessary for USB on the Exynos7885
SoC.
While at it, also fix some issues with the existing driver/bindings.
This set was split from a previous set containing clk, phy, and usb
patches [1].
Changes in v2:
- Split from full patchset.
- Added Cc-stable tags and fixes tag to update CLKS_NR_FSYS patch
- Blank line fixes
Cc: stable(a)vger.kernel.org
[1] https://lore.kernel.org/linux-samsung-soc/20240804215458.404085-1-virag.dav…
David Virag (7):
dt-bindings: clock: exynos7885: Fix duplicated binding
dt-bindings: clock: exynos7885: Add CMU_TOP PLL MUX indices
dt-bindings: clock: exynos7885: Add indices for USB clocks
clk: samsung: exynos7885: Update CLKS_NR_FSYS after bindings fix
clk: samsung: exynos7885: Add missing MUX clocks from PLLs in CMU_TOP
clk: samsung: clk-pll: Add support for pll_1418x
clk: samsung: exynos7885: Add USB related clocks to CMU_FSYS
drivers/clk/samsung/clk-exynos7885.c | 93 ++++++++++++++++++++------
drivers/clk/samsung/clk-pll.c | 20 ++++--
drivers/clk/samsung/clk-pll.h | 1 +
include/dt-bindings/clock/exynos7885.h | 32 ++++++---
4 files changed, 111 insertions(+), 35 deletions(-)
--
2.46.0
Coverity reports (as CID 1536978) that uart_poll_init() passes
uninitialized pm_state to uart_change_pm(). It is in case the first 'if'
takes the true branch (does "goto out;").
Fix this and simplify the function by simple guard(mutex). The code
needs no labels after this at all. And it is pretty clear that the code
has not fiddled with pm_state at that point.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby(a)kernel.org>
Fixes: 5e227ef2aa38 (serial: uart_poll_init() should power on the UART)
Cc: stable(a)vger.kernel.org
Cc: Douglas Anderson <dianders(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/serial_core.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 3afe77f05abf..d63e9b636e02 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2690,14 +2690,13 @@ static int uart_poll_init(struct tty_driver *driver, int line, char *options)
int ret = 0;
tport = &state->port;
- mutex_lock(&tport->mutex);
+
+ guard(mutex)(&tport->mutex);
port = uart_port_check(state);
if (!port || port->type == PORT_UNKNOWN ||
- !(port->ops->poll_get_char && port->ops->poll_put_char)) {
- ret = -1;
- goto out;
- }
+ !(port->ops->poll_get_char && port->ops->poll_put_char))
+ return -1;
pm_state = state->pm_state;
uart_change_pm(state, UART_PM_STATE_ON);
@@ -2717,10 +2716,10 @@ static int uart_poll_init(struct tty_driver *driver, int line, char *options)
ret = uart_set_options(port, NULL, baud, parity, bits, flow);
console_list_unlock();
}
-out:
+
if (ret)
uart_change_pm(state, pm_state);
- mutex_unlock(&tport->mutex);
+
return ret;
}
--
2.46.0
From: Martin Karsten <mkarsten(a)uwaterloo.ca>
A struct eventpoll's busy_poll_usecs field can be modified via a user
ioctl at any time. All reads of this field should be annotated with
READ_ONCE.
Fixes: 85455c795c07 ("eventpoll: support busy poll per epoll instance")
Cc: stable(a)vger.kernel.org
Signed-off-by: Martin Karsten <mkarsten(a)uwaterloo.ca>
Reviewed-by: Joe Damato <jdamato(a)fastly.com>
---
fs/eventpoll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f53ca4f7fced..6d0e2f547ae7 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -420,7 +420,7 @@ static bool busy_loop_ep_timeout(unsigned long start_time,
static bool ep_busy_loop_on(struct eventpoll *ep)
{
- return !!ep->busy_poll_usecs || net_busy_loop_on();
+ return !!READ_ONCE(ep->busy_poll_usecs) || net_busy_loop_on();
}
static bool ep_busy_loop_end(void *p, unsigned long start_time)
--
2.25.1
If the iovec inside the kmsg isn't already allocated AND one gets
expanded beyond the fixed size, then the request may not already have
been marked for cleanup. Ensure that it is.
Cc: stable(a)vger.kernel.org
Fixes: a05d1f625c7a ("io_uring/net: support bundles for send")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
io_uring/net.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/io_uring/net.c b/io_uring/net.c
index 97a48408cec3..050bea5e7256 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -623,6 +623,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_iov) {
kmsg->free_iov_nr = ret;
kmsg->free_iov = arg.iovs;
+ req->flags |= REQ_F_NEED_CLEANUP;
}
}
--
2.43.0
The quilt patch titled
Subject: padata: fix possible divide-by-0 panic in padata_mt_helper()
has been removed from the -mm tree. Its filename was
padata-fix-possible-divide-by-0-panic-in-padata_mt_helper.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: padata: Fix possible divide-by-0 panic in padata_mt_helper()
Date: Tue, 6 Aug 2024 13:46:47 -0400
We are hit with a not easily reproducible divide-by-0 panic in padata.c at
bootup time.
[ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
[ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
[ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
[ 10.017908] Workqueue: events_unbound padata_mt_helper
[ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
:
[ 10.017963] Call Trace:
[ 10.017968] <TASK>
[ 10.018004] ? padata_mt_helper+0x39/0xb0
[ 10.018084] process_one_work+0x174/0x330
[ 10.018093] worker_thread+0x266/0x3a0
[ 10.018111] kthread+0xcf/0x100
[ 10.018124] ret_from_fork+0x31/0x50
[ 10.018138] ret_from_fork_asm+0x1a/0x30
[ 10.018147] </TASK>
Looking at the padata_mt_helper() function, the only way a divide-by-0
panic can happen is when ps->chunk_size is 0. The way that chunk_size is
initialized in padata_do_multithreaded(), chunk_size can be 0 when the
min_chunk in the passed-in padata_mt_job structure is 0.
Fix this divide-by-0 panic by making sure that chunk_size will be at least
1 no matter what the input parameters are.
Link: https://lkml.kernel.org/r/20240806174647.1050398-1-longman@redhat.com
Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Waiman Long <longman(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/padata.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/kernel/padata.c~padata-fix-possible-divide-by-0-panic-in-padata_mt_helper
+++ a/kernel/padata.c
@@ -517,6 +517,13 @@ void __init padata_do_multithreaded(stru
ps.chunk_size = max(ps.chunk_size, job->min_chunk);
ps.chunk_size = roundup(ps.chunk_size, job->align);
+ /*
+ * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
+ * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
+ */
+ if (!ps.chunk_size)
+ ps.chunk_size = 1U;
+
list_for_each_entry(pw, &works, pw_list)
if (job->numa_aware) {
int old_node = atomic_read(&last_used_nid);
_
Patches currently in -mm which might be from longman(a)redhat.com are
mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu.patch
mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu-v3.patch
lib-stackdepot-double-depot_pools_cap-if-kasan-is-enabled.patch
watchdog-handle-the-enodev-failure-case-of-lockup_detector_delay_init-separately.patch