The following changes since commit d082ecbc71e9e0bf49883ee4afd435a77a5101b6:
Linux 6.14-rc4 (2025-02-23 12:32:57 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 9d8960672d63db4b3b04542f5622748b345c637a:
vhost-scsi: Reduce response iov mem use (2025-02-25 07:10:46 -0500)
----------------------------------------------------------------
virtio: features, fixes, cleanups
A small number of improvements all over the place:
shutdown has been reworked to reset devices.
virtio fs is now allowed in vduse.
vhost-scsi memory use has been reduced.
cleanups, fixes all over the place.
A couple more fixes are being tested and will be merged after rc1.
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Eugenio Pérez (1):
vduse: add virtio_fs to allowed dev id
John Stultz (1):
sound/virtio: Fix cancel_sync warnings on uninitialized work_structs
Konstantin Shkolnyy (1):
vdpa/mlx5: Fix mlx5_vdpa_get_config() endianness on big-endian machines
Michael S. Tsirkin (1):
virtio: break and reset virtio devices on device_shutdown()
Mike Christie (9):
vhost-scsi: Fix handling of multiple calls to vhost_scsi_set_endpoint
vhost-scsi: Reduce mem use by moving upages to per queue
vhost-scsi: Allocate T10 PI structs only when enabled
vhost-scsi: Add better resource allocation failure handling
vhost-scsi: Return queue full for page alloc failures during copy
vhost-scsi: Dynamically allocate scatterlists
vhost-scsi: Stop duplicating se_cmd fields
vhost-scsi: Allocate iov_iter used for unaligned copies when needed
vhost-scsi: Reduce response iov mem use
Si-Wei Liu (1):
vdpa/mlx5: Fix oversized null mkey longer than 32bit
Yufeng Wang (3):
tools/virtio: Add DMA_MAPPING_ERROR and sg_dma_len api define for virtio test
tools: virtio/linux/compiler.h: Add data_race() define.
tools: virtio/linux/module.h add MODULE_DESCRIPTION() define.
drivers/vdpa/mlx5/core/mr.c | 7 +-
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +
drivers/vdpa/vdpa_user/vduse_dev.c | 1 +
drivers/vhost/Kconfig | 1 +
drivers/vhost/scsi.c | 549 +++++++++++++++++++++++--------------
drivers/virtio/virtio.c | 29 ++
sound/virtio/virtio_pcm.c | 21 +-
tools/virtio/linux/compiler.h | 25 ++
tools/virtio/linux/dma-mapping.h | 13 +
tools/virtio/linux/module.h | 7 +
10 files changed, 439 insertions(+), 217 deletions(-)
From: Wenlin Kang <wenlin.kang(a)windriver.com>
The selftest tpdir2 terminated with a 'Segmentation fault' during loading.
root@localhost:~# cd linux-kenel/tools/testing/selftests/arm64/abi && make
root@localhost:~/linux-kernel/tools/testing/selftests/arm64/abi# ./tpidr2
Segmentation fault
The cause of this is the __arch_clear_user() failure.
load_elf_binary() [fs/binfmt_elf.c]
-> if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bes)))
-> padzero()
-> clear_user() [arch/arm64/include/asm/uaccess.h]
-> __arch_clear_user() [arch/arm64/lib/clear_user.S]
For more details, please see:
https://lore.kernel.org/lkml/1d0342f3-0474-482b-b6db-81ca7820a462@t-8ch.de/…
This issue has been fixed in the mainline. Here I have backported
the relevant commits for the linux-6.1.y branch and attached them.
With these patches, tpdir2 works as:
root@localhost:~/linux-kernel/tools/testing/selftests/arm64/abi# ./tpidr2
TAP version 13
1..5
ok 0 skipped, TPIDR2 not supported
ok 1 skipped, TPIDR2 not supported
ok 2 skipped, TPIDR2 not supported
ok 3 skipped, TPIDR2 not supported
ok 4 skipped, TPIDR2 not supported
The first patch is just for alignment to apply the follow patches.
This issue is resolved by the second patch. However, to ensure
functional completeness, all related patches were backported
according to the following link.
https://lore.kernel.org/all/20230929031716.it.155-kees@kernel.org/#t
Bo Liu (1):
binfmt_elf: replace IS_ERR() with IS_ERR_VALUE()
Eric W. Biederman (1):
binfmt_elf: Support segments with 0 filesz and misaligned starts
Kees Cook (5):
binfmt_elf: elf_bss no longer used by load_elf_binary()
binfmt_elf: Use elf_load() for interpreter
binfmt_elf: Use elf_load() for library
binfmt_elf: Only report padzero() errors when PROT_WRITE
mm: Remove unused vm_brk()
fs/binfmt_elf.c | 221 ++++++++++++++++-----------------------------
include/linux/mm.h | 3 +-
mm/mmap.c | 6 --
mm/nommu.c | 5 -
4 files changed, 79 insertions(+), 156 deletions(-)
--
2.39.2
From: Steven Rostedt <rostedt(a)goodmis.org>
The trace event verifier checks the formats of trace events to make sure
that they do not point at memory that is not in the trace event itself or
in data that will never be freed. If an event references data that was
allocated when the event triggered and that same data is freed before the
event is read, then the kernel can crash by reading freed memory.
The verifier runs at boot up (or module load) and scans the print formats
of the events and checks their arguments to make sure that dereferenced
pointers are safe. If the format uses "%*p.." the verifier will ignore it,
and that could be dangerous. Cover this case as well.
Also add to the sample code a use case of "%*pbl".
Link: https://lore.kernel.org/all/bcba4d76-2c3f-4d11-baf0-02905db953dd@oracle.com/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Link: https://lore.kernel.org/20250327195311.2d89ec66@gandalf.local.home
Reported-by: Libo Chen <libo.chen(a)oracle.com>
Reviewed-by: Libo Chen <libo.chen(a)oracle.com>
Tested-by: Libo Chen <libo.chen(a)oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events.c | 7 +++++++
samples/trace_events/trace-events-sample.h | 8 ++++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 8638b7f7ff85..069e92856bda 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -470,6 +470,7 @@ static void test_event_printk(struct trace_event_call *call)
case '%':
continue;
case 'p':
+ do_pointer:
/* Find dereferencing fields */
switch (fmt[i + 1]) {
case 'B': case 'R': case 'r':
@@ -498,6 +499,12 @@ static void test_event_printk(struct trace_event_call *call)
continue;
if (fmt[i + j] == '*') {
star = true;
+ /* Handle %*pbl case */
+ if (!j && fmt[i + 1] == 'p') {
+ arg++;
+ i++;
+ goto do_pointer;
+ }
continue;
}
if ((fmt[i + j] == 's')) {
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index 999f78d380ae..1a05fc153353 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -319,7 +319,8 @@ TRACE_EVENT(foo_bar,
__assign_cpumask(cpum, cpumask_bits(mask));
),
- TP_printk("foo %s %d %s %s %s %s %s %s (%s) (%s) %s", __entry->foo, __entry->bar,
+ TP_printk("foo %s %d %s %s %s %s %s %s (%s) (%s) %s [%d] %*pbl",
+ __entry->foo, __entry->bar,
/*
* Notice here the use of some helper functions. This includes:
@@ -370,7 +371,10 @@ TRACE_EVENT(foo_bar,
__get_str(str), __get_str(lstr),
__get_bitmask(cpus), __get_cpumask(cpum),
- __get_str(vstr))
+ __get_str(vstr),
+ __get_dynamic_array_len(cpus),
+ __get_dynamic_array_len(cpus),
+ __get_dynamic_array(cpus))
);
/*
--
2.47.2
From: zhoumin <teczm(a)foxmail.com>
When the kernel contains a large number of functions that can be traced,
the loop in ftrace_graph_set_hash() may take a lot of time to execute.
This may trigger the softlockup watchdog.
Add cond_resched() within the loop to allow the kernel to remain
responsive even when processing a large number of functions.
This matches the cond_resched() that is used in other locations of the
code that iterates over all functions that can be traced.
Cc: stable(a)vger.kernel.org
Fixes: b9b0c831bed26 ("ftrace: Convert graph filter to use hash tables")
Link: https://lore.kernel.org/tencent_3E06CE338692017B5809534B9C5C03DA7705@qq.com
Signed-off-by: zhoumin <teczm(a)foxmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 92015de6203d..1a48aedb5255 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6855,6 +6855,7 @@ ftrace_graph_set_hash(struct ftrace_hash *hash, char *buffer)
}
}
}
+ cond_resched();
} while_for_each_ftrace_rec();
return fail ? -EINVAL : 0;
--
2.47.2
From: Steven Rostedt <rostedt(a)goodmis.org>
Some architectures do not have data cache coherency between user and
kernel space. For these architectures, the cache needs to be flushed on
both the kernel and user addresses so that user space can see the updates
the kernel has made.
Instead of using flush_dcache_folio() and playing with virt_to_folio()
within the call to that function, use flush_kernel_vmap_range() which
takes the virtual address and does the work for those architectures that
need it.
This also fixes a bug where the flush of the reader page only flushed one
page. If the sub-buffer order is 1 or more, where the sub-buffer size
would be greater than a page, it would miss the rest of the sub-buffer
content, as the "reader page" is not just a page, but the size of a
sub-buffer.
Link: https://lore.kernel.org/all/CAG48ez3w0my4Rwttbc5tEbNsme6tc0mrSN95thjXUFaJ3a…
Cc: stable(a)vger.kernel.org
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions");
Suggested-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d8d7b28e2c2f..c0f877d39a24 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6016,7 +6016,7 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
meta->read = cpu_buffer->read;
/* Some archs do not have data cache coherency between kernel and user-space */
- flush_dcache_folio(virt_to_folio(cpu_buffer->meta_page));
+ flush_kernel_vmap_range(cpu_buffer->meta_page, PAGE_SIZE);
}
static void
@@ -7319,7 +7319,8 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
out:
/* Some archs do not have data cache coherency between kernel and user-space */
- flush_dcache_folio(virt_to_folio(cpu_buffer->reader_page->page));
+ flush_kernel_vmap_range(cpu_buffer->reader_page->page,
+ buffer->subbuf_size + BUF_PAGE_HDR_SIZE);
rb_update_meta_page(cpu_buffer);
--
2.47.2
The quilt patch titled
Subject: lib: scatterlist: fix sg_split_phys to preserve original scatterlist offsets
has been removed from the -mm tree. Its filename was
lib-scatterlist-fix-sg_split_phys-to-preserve-original-scatterlist-offsets.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: T Pratham <t-pratham(a)ti.com>
Subject: lib: scatterlist: fix sg_split_phys to preserve original scatterlist offsets
Date: Wed, 19 Mar 2025 16:44:38 +0530
The split_sg_phys function was incorrectly setting the offsets of all
scatterlist entries (except the first) to 0. Only the first scatterlist
entry's offset and length needs to be modified to account for the skip.
Setting the rest entries' offsets to 0 could lead to incorrect data
access.
I am using this function in a crypto driver that I'm currently developing
(not yet sent to mailing list). During testing, it was observed that the
output scatterlists (except the first one) contained incorrect garbage
data.
I narrowed this issue down to the call of sg_split(). Upon debugging
inside this function, I found that this resetting of offset is the cause
of the problem, causing the subsequent scatterlists to point to incorrect
memory locations in a page. By removing this code, I am obtaining
expected data in all the split output scatterlists. Thus, this was indeed
causing observable runtime effects!
This patch removes the offending code, ensuring that the page offsets in
the input scatterlist are preserved in the output scatterlist.
Link: https://lkml.kernel.org/r/20250319111437.1969903-1-t-pratham@ti.com
Fixes: f8bcbe62acd0 ("lib: scatterlist: add sg splitting function")
Signed-off-by: T Pratham <t-pratham(a)ti.com>
Cc: Robert Jarzmik <robert.jarzmik(a)free.fr>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Kamlesh Gurudasani <kamlesh(a)ti.com>
Cc: Praneeth Bajjuri <praneeth(a)ti.com>
Cc: Vignesh Raghavendra <vigneshr(a)ti.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/sg_split.c | 2 --
1 file changed, 2 deletions(-)
--- a/lib/sg_split.c~lib-scatterlist-fix-sg_split_phys-to-preserve-original-scatterlist-offsets
+++ a/lib/sg_split.c
@@ -88,8 +88,6 @@ static void sg_split_phys(struct sg_spli
if (!j) {
out_sg->offset += split->skip_sg0;
out_sg->length -= split->skip_sg0;
- } else {
- out_sg->offset = 0;
}
sg_dma_address(out_sg) = 0;
sg_dma_len(out_sg) = 0;
_
Patches currently in -mm which might be from t-pratham(a)ti.com are
The quilt patch titled
Subject: mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
has been removed from the -mm tree. Its filename was
mm-zswap-fix-crypto_free_acomp-deadlock-in-zswap_cpu_comp_dead.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Subject: mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
Date: Wed, 26 Feb 2025 18:56:25 +0000
Currently, zswap_cpu_comp_dead() calls crypto_free_acomp() while holding
the per-CPU acomp_ctx mutex. crypto_free_acomp() then holds scomp_lock
(through crypto_exit_scomp_ops_async()).
On the other hand, crypto_alloc_acomp_node() holds the scomp_lock (through
crypto_scomp_init_tfm()), and then allocates memory. If the allocation
results in reclaim, we may attempt to hold the per-CPU acomp_ctx mutex.
The above dependencies can cause an ABBA deadlock. For example in the
following scenario:
(1) Task A running on CPU #1:
crypto_alloc_acomp_node()
Holds scomp_lock
Enters reclaim
Reads per_cpu_ptr(pool->acomp_ctx, 1)
(2) Task A is descheduled
(3) CPU #1 goes offline
zswap_cpu_comp_dead(CPU #1)
Holds per_cpu_ptr(pool->acomp_ctx, 1))
Calls crypto_free_acomp()
Waits for scomp_lock
(4) Task A running on CPU #2:
Waits for per_cpu_ptr(pool->acomp_ctx, 1) // Read on CPU #1
DEADLOCK
Since there is no requirement to call crypto_free_acomp() with the per-CPU
acomp_ctx mutex held in zswap_cpu_comp_dead(), move it after the mutex is
unlocked. Also move the acomp_request_free() and kfree() calls for
consistency and to avoid any potential sublte locking dependencies in the
future.
With this, only setting acomp_ctx fields to NULL occurs with the mutex
held. This is similar to how zswap_cpu_comp_prepare() only initializes
acomp_ctx fields with the mutex held, after performing all allocations
before holding the mutex.
Opportunistically, move the NULL check on acomp_ctx so that it takes place
before the mutex dereference.
Link: https://lkml.kernel.org/r/20250226185625.2672936-1-yosry.ahmed@linux.dev
Fixes: 12dcb0ef5406 ("mm: zswap: properly synchronize freeing resources during CPU hotunplug")
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Co-developed-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Reported-by: syzbot+1a517ccfcbc6a7ab0f82(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67bcea51.050a0220.bbfd1.0096.GAE@google.com/
Acked-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Reviewed-by: Chengming Zhou <chengming.zhou(a)linux.dev>
Reviewed-by: Nhat Pham <nphamcs(a)gmail.com>
Tested-by: Nhat Pham <nphamcs(a)gmail.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Eric Biggers <ebiggers(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Chris Murphy <lists(a)colorremedies.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zswap.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
--- a/mm/zswap.c~mm-zswap-fix-crypto_free_acomp-deadlock-in-zswap_cpu_comp_dead
+++ a/mm/zswap.c
@@ -883,18 +883,32 @@ static int zswap_cpu_comp_dead(unsigned
{
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
+ struct acomp_req *req;
+ struct crypto_acomp *acomp;
+ u8 *buffer;
+
+ if (IS_ERR_OR_NULL(acomp_ctx))
+ return 0;
mutex_lock(&acomp_ctx->mutex);
- if (!IS_ERR_OR_NULL(acomp_ctx)) {
- if (!IS_ERR_OR_NULL(acomp_ctx->req))
- acomp_request_free(acomp_ctx->req);
- acomp_ctx->req = NULL;
- if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
- crypto_free_acomp(acomp_ctx->acomp);
- kfree(acomp_ctx->buffer);
- }
+ req = acomp_ctx->req;
+ acomp = acomp_ctx->acomp;
+ buffer = acomp_ctx->buffer;
+ acomp_ctx->req = NULL;
+ acomp_ctx->acomp = NULL;
+ acomp_ctx->buffer = NULL;
mutex_unlock(&acomp_ctx->mutex);
+ /*
+ * Do the actual freeing after releasing the mutex to avoid subtle
+ * locking dependencies causing deadlocks.
+ */
+ if (!IS_ERR_OR_NULL(req))
+ acomp_request_free(req);
+ if (!IS_ERR_OR_NULL(acomp))
+ crypto_free_acomp(acomp);
+ kfree(buffer);
+
return 0;
}
_
Patches currently in -mm which might be from yosry.ahmed(a)linux.dev are