The restoring of TPIDR2 signal context has been broken since it was
merged, fix this and add a test case covering it. This is a result of
TPIDR2 context management following a different flow to any of the other
state that we provide and the fact that we don't expose TPIDR (which
follows the same pattern) to signals.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Added a feature check for SME to the new test.
- Link to v1: https://lore.kernel.org/r/20230621-arm64-fix-tpidr2-signal-restore-v1-0-b6d…
---
Mark Brown (2):
arm64/signal: Restore TPIDR2 register rather than memory state
kselftest/arm64: Add a test case for TPIDR2 restore
arch/arm64/kernel/signal.c | 2 +-
tools/testing/selftests/arm64/signal/.gitignore | 2 +-
.../arm64/signal/testcases/tpidr2_restore.c | 86 ++++++++++++++++++++++
3 files changed, 88 insertions(+), 2 deletions(-)
---
base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375
change-id: 20230621-arm64-fix-tpidr2-signal-restore-713d93798f99
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 54abe19e00cfcc5a72773d15cd00ed19ab763439
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062330-bulb-sadden-dffe@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54abe19e00cfcc5a72773d15cd00ed19ab763439 Mon Sep 17 00:00:00 2001
From: Rafael Aquini <aquini(a)redhat.com>
Date: Tue, 6 Jun 2023 19:36:13 -0400
Subject: [PATCH] writeback: fix dereferencing NULL mapping->host on
writeback_page_template
When commit 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for
wait_on_page_writeback()") repurposed the writeback_dirty_page trace event
as a template to create its new wait_on_page_writeback trace event, it
ended up opening a window to NULL pointer dereference crashes due to the
(infrequent) occurrence of a race where an access to a page in the
swap-cache happens concurrently with the moment this page is being written
to disk and the tracepoint is enabled:
BUG: kernel NULL pointer dereference, address: 0000000000000040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023
RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0
Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d
RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044
RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006
R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000
R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200
FS: 00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x76/0x170
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x65/0x150
? asm_exc_page_fault+0x22/0x30
? trace_event_raw_event_writeback_folio_template+0x76/0xf0
folio_wait_writeback+0x6b/0x80
shmem_swapin_folio+0x24a/0x500
? filemap_get_entry+0xe3/0x140
shmem_get_folio_gfp+0x36e/0x7c0
? find_busiest_group+0x43/0x1a0
shmem_fault+0x76/0x2a0
? __update_load_avg_cfs_rq+0x281/0x2f0
__do_fault+0x33/0x130
do_read_fault+0x118/0x160
do_pte_missing+0x1ed/0x2a0
__handle_mm_fault+0x566/0x630
handle_mm_fault+0x91/0x210
do_user_addr_fault+0x22c/0x740
exc_page_fault+0x65/0x150
asm_exc_page_fault+0x22/0x30
This problem arises from the fact that the repurposed writeback_dirty_page
trace event code was written assuming that every pointer to mapping
(struct address_space) would come from a file-mapped page-cache object,
thus mapping->host would always be populated, and that was a valid case
before commit 19343b5bdd16. The swap-cache address space
(swapper_spaces), however, doesn't populate its ->host (struct inode)
pointer, thus leading to the crashes in the corner-case aforementioned.
commit 19343b5bdd16 ended up breaking the assignment of __entry->name and
__entry->ino for the wait_on_page_writeback tracepoint -- both dependent
on mapping->host carrying a pointer to a valid inode. The assignment of
__entry->name was fixed by commit 68f23b89067f ("memcg: fix a crash in
wb_workfn when a device disappears"), and this commit fixes the remaining
case, for __entry->ino.
Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com
Fixes: 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()")
Signed-off-by: Rafael Aquini <aquini(a)redhat.com>
Reviewed-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: Aristeu Rozanski <aris(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 86b2a82da546..54e353c9f919 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(writeback_folio_template,
strscpy_pad(__entry->name,
bdi_dev_name(mapping ? inode_to_bdi(mapping->host) :
NULL), 32);
- __entry->ino = mapping ? mapping->host->i_ino : 0;
+ __entry->ino = (mapping && mapping->host) ? mapping->host->i_ino : 0;
__entry->index = folio->index;
),
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 54abe19e00cfcc5a72773d15cd00ed19ab763439
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062333-security-reenact-e4a6@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54abe19e00cfcc5a72773d15cd00ed19ab763439 Mon Sep 17 00:00:00 2001
From: Rafael Aquini <aquini(a)redhat.com>
Date: Tue, 6 Jun 2023 19:36:13 -0400
Subject: [PATCH] writeback: fix dereferencing NULL mapping->host on
writeback_page_template
When commit 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for
wait_on_page_writeback()") repurposed the writeback_dirty_page trace event
as a template to create its new wait_on_page_writeback trace event, it
ended up opening a window to NULL pointer dereference crashes due to the
(infrequent) occurrence of a race where an access to a page in the
swap-cache happens concurrently with the moment this page is being written
to disk and the tracepoint is enabled:
BUG: kernel NULL pointer dereference, address: 0000000000000040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023
RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0
Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d
RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044
RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006
R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000
R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200
FS: 00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x76/0x170
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x65/0x150
? asm_exc_page_fault+0x22/0x30
? trace_event_raw_event_writeback_folio_template+0x76/0xf0
folio_wait_writeback+0x6b/0x80
shmem_swapin_folio+0x24a/0x500
? filemap_get_entry+0xe3/0x140
shmem_get_folio_gfp+0x36e/0x7c0
? find_busiest_group+0x43/0x1a0
shmem_fault+0x76/0x2a0
? __update_load_avg_cfs_rq+0x281/0x2f0
__do_fault+0x33/0x130
do_read_fault+0x118/0x160
do_pte_missing+0x1ed/0x2a0
__handle_mm_fault+0x566/0x630
handle_mm_fault+0x91/0x210
do_user_addr_fault+0x22c/0x740
exc_page_fault+0x65/0x150
asm_exc_page_fault+0x22/0x30
This problem arises from the fact that the repurposed writeback_dirty_page
trace event code was written assuming that every pointer to mapping
(struct address_space) would come from a file-mapped page-cache object,
thus mapping->host would always be populated, and that was a valid case
before commit 19343b5bdd16. The swap-cache address space
(swapper_spaces), however, doesn't populate its ->host (struct inode)
pointer, thus leading to the crashes in the corner-case aforementioned.
commit 19343b5bdd16 ended up breaking the assignment of __entry->name and
__entry->ino for the wait_on_page_writeback tracepoint -- both dependent
on mapping->host carrying a pointer to a valid inode. The assignment of
__entry->name was fixed by commit 68f23b89067f ("memcg: fix a crash in
wb_workfn when a device disappears"), and this commit fixes the remaining
case, for __entry->ino.
Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com
Fixes: 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()")
Signed-off-by: Rafael Aquini <aquini(a)redhat.com>
Reviewed-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: Aristeu Rozanski <aris(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 86b2a82da546..54e353c9f919 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(writeback_folio_template,
strscpy_pad(__entry->name,
bdi_dev_name(mapping ? inode_to_bdi(mapping->host) :
NULL), 32);
- __entry->ino = mapping ? mapping->host->i_ino : 0;
+ __entry->ino = (mapping && mapping->host) ? mapping->host->i_ino : 0;
__entry->index = folio->index;
),
The patch below does not apply to the 6.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.3.y
git checkout FETCH_HEAD
git cherry-pick -x ef7dfac51d8ed961b742218f526bd589f3900a59
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062306-omen-dance-80f0@gregkh' --subject-prefix 'PATCH 6.3.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ef7dfac51d8ed961b742218f526bd589f3900a59 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Sat, 17 Jun 2023 19:50:24 -0600
Subject: [PATCH] io_uring/poll: serialize poll linked timer start with poll
removal
We selectively grab the ctx->uring_lock for poll update/removal, but
we really should grab it from the start to fully synchronize with
linked timeouts. Normally this is indeed the case, but if requests
are forced async by the application, we don't fully cover removal
and timer disarm within the uring_lock.
Make this simpler by having consistent locking state for poll removal.
Cc: stable(a)vger.kernel.org # 6.1+
Reported-by: Querijn Voet <querijnqyn(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index c90e47dc1e29..a78b8af7d9ab 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -977,8 +977,9 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
struct io_hash_bucket *bucket;
struct io_kiocb *preq;
int ret2, ret = 0;
- struct io_tw_state ts = {};
+ struct io_tw_state ts = { .locked = true };
+ io_ring_submit_lock(ctx, issue_flags);
preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
ret2 = io_poll_disarm(preq);
if (bucket)
@@ -990,12 +991,10 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
goto out;
}
- io_ring_submit_lock(ctx, issue_flags);
preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table_locked, &bucket);
ret2 = io_poll_disarm(preq);
if (bucket)
spin_unlock(&bucket->lock);
- io_ring_submit_unlock(ctx, issue_flags);
if (ret2) {
ret = ret2;
goto out;
@@ -1019,7 +1018,7 @@ found:
if (poll_update->update_user_data)
preq->cqe.user_data = poll_update->new_user_data;
- ret2 = io_poll_add(preq, issue_flags);
+ ret2 = io_poll_add(preq, issue_flags & ~IO_URING_F_UNLOCKED);
/* successfully updated, don't complete poll request */
if (!ret2 || ret2 == -EIOCBQUEUED)
goto out;
@@ -1027,9 +1026,9 @@ found:
req_set_fail(preq);
io_req_set_res(preq, -ECANCELED, 0);
- ts.locked = !(issue_flags & IO_URING_F_UNLOCKED);
io_req_task_complete(preq, &ts);
out:
+ io_ring_submit_unlock(ctx, issue_flags);
if (ret < 0) {
req_set_fail(req);
return ret;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ef7dfac51d8ed961b742218f526bd589f3900a59
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062307-taco-nurture-70a2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ef7dfac51d8ed961b742218f526bd589f3900a59 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Sat, 17 Jun 2023 19:50:24 -0600
Subject: [PATCH] io_uring/poll: serialize poll linked timer start with poll
removal
We selectively grab the ctx->uring_lock for poll update/removal, but
we really should grab it from the start to fully synchronize with
linked timeouts. Normally this is indeed the case, but if requests
are forced async by the application, we don't fully cover removal
and timer disarm within the uring_lock.
Make this simpler by having consistent locking state for poll removal.
Cc: stable(a)vger.kernel.org # 6.1+
Reported-by: Querijn Voet <querijnqyn(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index c90e47dc1e29..a78b8af7d9ab 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -977,8 +977,9 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
struct io_hash_bucket *bucket;
struct io_kiocb *preq;
int ret2, ret = 0;
- struct io_tw_state ts = {};
+ struct io_tw_state ts = { .locked = true };
+ io_ring_submit_lock(ctx, issue_flags);
preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
ret2 = io_poll_disarm(preq);
if (bucket)
@@ -990,12 +991,10 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
goto out;
}
- io_ring_submit_lock(ctx, issue_flags);
preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table_locked, &bucket);
ret2 = io_poll_disarm(preq);
if (bucket)
spin_unlock(&bucket->lock);
- io_ring_submit_unlock(ctx, issue_flags);
if (ret2) {
ret = ret2;
goto out;
@@ -1019,7 +1018,7 @@ found:
if (poll_update->update_user_data)
preq->cqe.user_data = poll_update->new_user_data;
- ret2 = io_poll_add(preq, issue_flags);
+ ret2 = io_poll_add(preq, issue_flags & ~IO_URING_F_UNLOCKED);
/* successfully updated, don't complete poll request */
if (!ret2 || ret2 == -EIOCBQUEUED)
goto out;
@@ -1027,9 +1026,9 @@ found:
req_set_fail(preq);
io_req_set_res(preq, -ECANCELED, 0);
- ts.locked = !(issue_flags & IO_URING_F_UNLOCKED);
io_req_task_complete(preq, &ts);
out:
+ io_ring_submit_unlock(ctx, issue_flags);
if (ret < 0) {
req_set_fail(req);
return ret;
Hi,
I would like to request for cherry-picking commit 7074732c8fae ("ip_tunnels:
allow VXLAN/GENEVE to inherit TOS/TTL from VLAN") in linux-5.15.y and
linux-5.4.y branches.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
This commit has lived since a long time in upstream (11 months), the potential
regressions seems low. The cherry-pick is straightforward.
It fixes the vxlan tos inherit option when vlan frames are encapsulated in vxlan.
The kernel 5.4 and 5.15 are used by a lot of vendors, having this patch will fix
this bug.
Regards,
Nicolas
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062338-rebuild-ground-2867@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 7 Jun 2023 10:56:00 +1000
Subject: [PATCH] powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs
enabled
Switching mm and tinkering with current->active_mm should be done with
irqs disabled. There is a path where exit_lazy_flush_tlb can be called
with irqs enabled:
exit_lazy_flush_tlb
flush_type_needed
__flush_all_mm
tlb_finish_mmu
exit_mmap
Which results in the switching being done with irqs enabled, which is
incorrect.
Fixes: a665eec0a22e ("powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm")
Cc: stable(a)vger.kernel.org # v5.10+
Reported-by: Sachin Sant <sachinp(a)linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/A9A5D83D-BA70-47A4-BCB4-30C1AE19BC22@l…
Tested-by: Sachin Sant <sachinp(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20230607005601.583293-1-npiggin@gmail.com
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index ce804b7bf84e..0bd4866d9824 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -795,12 +795,20 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
goto out;
if (current->active_mm == mm) {
+ unsigned long flags;
+
WARN_ON_ONCE(current->mm != NULL);
- /* Is a kernel thread and is using mm as the lazy tlb */
+ /*
+ * It is a kernel thread and is using mm as the lazy tlb, so
+ * switch it to init_mm. This is not always called from IPI
+ * (e.g., flush_type_needed), so must disable irqs.
+ */
+ local_irq_save(flags);
mmgrab_lazy_tlb(&init_mm);
current->active_mm = &init_mm;
switch_mm_irqs_off(mm, &init_mm, current);
mmdrop_lazy_tlb(mm);
+ local_irq_restore(flags);
}
/*
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062337-senorita-unreeling-8c43@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 7 Jun 2023 10:56:00 +1000
Subject: [PATCH] powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs
enabled
Switching mm and tinkering with current->active_mm should be done with
irqs disabled. There is a path where exit_lazy_flush_tlb can be called
with irqs enabled:
exit_lazy_flush_tlb
flush_type_needed
__flush_all_mm
tlb_finish_mmu
exit_mmap
Which results in the switching being done with irqs enabled, which is
incorrect.
Fixes: a665eec0a22e ("powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm")
Cc: stable(a)vger.kernel.org # v5.10+
Reported-by: Sachin Sant <sachinp(a)linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/A9A5D83D-BA70-47A4-BCB4-30C1AE19BC22@l…
Tested-by: Sachin Sant <sachinp(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20230607005601.583293-1-npiggin@gmail.com
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index ce804b7bf84e..0bd4866d9824 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -795,12 +795,20 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
goto out;
if (current->active_mm == mm) {
+ unsigned long flags;
+
WARN_ON_ONCE(current->mm != NULL);
- /* Is a kernel thread and is using mm as the lazy tlb */
+ /*
+ * It is a kernel thread and is using mm as the lazy tlb, so
+ * switch it to init_mm. This is not always called from IPI
+ * (e.g., flush_type_needed), so must disable irqs.
+ */
+ local_irq_save(flags);
mmgrab_lazy_tlb(&init_mm);
current->active_mm = &init_mm;
switch_mm_irqs_off(mm, &init_mm, current);
mmdrop_lazy_tlb(mm);
+ local_irq_restore(flags);
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062336-ivy-sweep-e805@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 7 Jun 2023 10:56:00 +1000
Subject: [PATCH] powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs
enabled
Switching mm and tinkering with current->active_mm should be done with
irqs disabled. There is a path where exit_lazy_flush_tlb can be called
with irqs enabled:
exit_lazy_flush_tlb
flush_type_needed
__flush_all_mm
tlb_finish_mmu
exit_mmap
Which results in the switching being done with irqs enabled, which is
incorrect.
Fixes: a665eec0a22e ("powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm")
Cc: stable(a)vger.kernel.org # v5.10+
Reported-by: Sachin Sant <sachinp(a)linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/A9A5D83D-BA70-47A4-BCB4-30C1AE19BC22@l…
Tested-by: Sachin Sant <sachinp(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20230607005601.583293-1-npiggin@gmail.com
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index ce804b7bf84e..0bd4866d9824 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -795,12 +795,20 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
goto out;
if (current->active_mm == mm) {
+ unsigned long flags;
+
WARN_ON_ONCE(current->mm != NULL);
- /* Is a kernel thread and is using mm as the lazy tlb */
+ /*
+ * It is a kernel thread and is using mm as the lazy tlb, so
+ * switch it to init_mm. This is not always called from IPI
+ * (e.g., flush_type_needed), so must disable irqs.
+ */
+ local_irq_save(flags);
mmgrab_lazy_tlb(&init_mm);
current->active_mm = &init_mm;
switch_mm_irqs_off(mm, &init_mm, current);
mmdrop_lazy_tlb(mm);
+ local_irq_restore(flags);
}
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062335-earthen-worsening-7910@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 7 Jun 2023 10:56:00 +1000
Subject: [PATCH] powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs
enabled
Switching mm and tinkering with current->active_mm should be done with
irqs disabled. There is a path where exit_lazy_flush_tlb can be called
with irqs enabled:
exit_lazy_flush_tlb
flush_type_needed
__flush_all_mm
tlb_finish_mmu
exit_mmap
Which results in the switching being done with irqs enabled, which is
incorrect.
Fixes: a665eec0a22e ("powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm")
Cc: stable(a)vger.kernel.org # v5.10+
Reported-by: Sachin Sant <sachinp(a)linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/A9A5D83D-BA70-47A4-BCB4-30C1AE19BC22@l…
Tested-by: Sachin Sant <sachinp(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20230607005601.583293-1-npiggin@gmail.com
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index ce804b7bf84e..0bd4866d9824 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -795,12 +795,20 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
goto out;
if (current->active_mm == mm) {
+ unsigned long flags;
+
WARN_ON_ONCE(current->mm != NULL);
- /* Is a kernel thread and is using mm as the lazy tlb */
+ /*
+ * It is a kernel thread and is using mm as the lazy tlb, so
+ * switch it to init_mm. This is not always called from IPI
+ * (e.g., flush_type_needed), so must disable irqs.
+ */
+ local_irq_save(flags);
mmgrab_lazy_tlb(&init_mm);
current->active_mm = &init_mm;
switch_mm_irqs_off(mm, &init_mm, current);
mmdrop_lazy_tlb(mm);
+ local_irq_restore(flags);
}
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062330-dwarf-gazing-415c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 7 Jun 2023 10:56:00 +1000
Subject: [PATCH] powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs
enabled
Switching mm and tinkering with current->active_mm should be done with
irqs disabled. There is a path where exit_lazy_flush_tlb can be called
with irqs enabled:
exit_lazy_flush_tlb
flush_type_needed
__flush_all_mm
tlb_finish_mmu
exit_mmap
Which results in the switching being done with irqs enabled, which is
incorrect.
Fixes: a665eec0a22e ("powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm")
Cc: stable(a)vger.kernel.org # v5.10+
Reported-by: Sachin Sant <sachinp(a)linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/A9A5D83D-BA70-47A4-BCB4-30C1AE19BC22@l…
Tested-by: Sachin Sant <sachinp(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20230607005601.583293-1-npiggin@gmail.com
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index ce804b7bf84e..0bd4866d9824 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -795,12 +795,20 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
goto out;
if (current->active_mm == mm) {
+ unsigned long flags;
+
WARN_ON_ONCE(current->mm != NULL);
- /* Is a kernel thread and is using mm as the lazy tlb */
+ /*
+ * It is a kernel thread and is using mm as the lazy tlb, so
+ * switch it to init_mm. This is not always called from IPI
+ * (e.g., flush_type_needed), so must disable irqs.
+ */
+ local_irq_save(flags);
mmgrab_lazy_tlb(&init_mm);
current->active_mm = &init_mm;
switch_mm_irqs_off(mm, &init_mm, current);
mmdrop_lazy_tlb(mm);
+ local_irq_restore(flags);
}
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 0108a4e9f3584a7a2c026d1601b0682ff7335d95
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062328-deception-unadvised-6e89@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0108a4e9f3584a7a2c026d1601b0682ff7335d95 Mon Sep 17 00:00:00 2001
From: Krister Johansen <kjlx(a)templeofstupid.com>
Date: Mon, 12 Jun 2023 17:44:40 -0700
Subject: [PATCH] bpf: ensure main program has an extable
When subprograms are in use, the main program is not jit'd after the
subprograms because jit_subprogs sets a value for prog->bpf_func upon
success. Subsequent calls to the JIT are bypassed when this value is
non-NULL. This leads to a situation where the main program and its
func[0] counterpart are both in the bpf kallsyms tree, but only func[0]
has an extable. Extables are only created during JIT. Now there are
two nearly identical program ksym entries in the tree, but only one has
an extable. Depending upon how the entries are placed, there's a chance
that a fault will call search_extable on the aux with the NULL entry.
Since jit_subprogs already copies state from func[0] to the main
program, include the extable pointer in this state duplication.
Additionally, ensure that the copy of the main program in func[0] is not
added to the bpf_prog_kallsyms table. Instead, let the main program get
added later in bpf_prog_load(). This ensures there is only a single
copy of the main program in the kallsyms table, and that its tag matches
the tag observed by tooling like bpftool.
Cc: stable(a)vger.kernel.org
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Krister Johansen <kjlx(a)templeofstupid.com>
Acked-by: Yonghong Song <yhs(a)fb.com>
Acked-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Link: https://lore.kernel.org/r/6de9b2f4b4724ef56efbb0339daaa66c8b68b1e7.16866166…
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0dd8adc7a159..cf5f230360f5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17217,9 +17217,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
}
/* finally lock prog and jit images for all functions and
- * populate kallsysm
+ * populate kallsysm. Begin at the first subprogram, since
+ * bpf_prog_load will add the kallsyms for the main program.
*/
- for (i = 0; i < env->subprog_cnt; i++) {
+ for (i = 1; i < env->subprog_cnt; i++) {
bpf_prog_lock_ro(func[i]);
bpf_prog_kallsyms_add(func[i]);
}
@@ -17245,6 +17246,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
prog->jited = 1;
prog->bpf_func = func[0]->bpf_func;
prog->jited_len = func[0]->jited_len;
+ prog->aux->extable = func[0]->aux->extable;
+ prog->aux->num_exentries = func[0]->aux->num_exentries;
prog->aux->func = func;
prog->aux->func_cnt = env->subprog_cnt;
bpf_prog_jit_attempt_done(prog);
The patch below does not apply to the 6.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.3.y
git checkout FETCH_HEAD
git cherry-pick -x dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062324-vending-reseller-25cf@gregkh' --subject-prefix 'PATCH 6.3.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfaed3e1fa7099de8de4e89cbe7eb9c1bca27dfe Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 7 Jun 2023 10:56:00 +1000
Subject: [PATCH] powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs
enabled
Switching mm and tinkering with current->active_mm should be done with
irqs disabled. There is a path where exit_lazy_flush_tlb can be called
with irqs enabled:
exit_lazy_flush_tlb
flush_type_needed
__flush_all_mm
tlb_finish_mmu
exit_mmap
Which results in the switching being done with irqs enabled, which is
incorrect.
Fixes: a665eec0a22e ("powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm")
Cc: stable(a)vger.kernel.org # v5.10+
Reported-by: Sachin Sant <sachinp(a)linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/A9A5D83D-BA70-47A4-BCB4-30C1AE19BC22@l…
Tested-by: Sachin Sant <sachinp(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20230607005601.583293-1-npiggin@gmail.com
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index ce804b7bf84e..0bd4866d9824 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -795,12 +795,20 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
goto out;
if (current->active_mm == mm) {
+ unsigned long flags;
+
WARN_ON_ONCE(current->mm != NULL);
- /* Is a kernel thread and is using mm as the lazy tlb */
+ /*
+ * It is a kernel thread and is using mm as the lazy tlb, so
+ * switch it to init_mm. This is not always called from IPI
+ * (e.g., flush_type_needed), so must disable irqs.
+ */
+ local_irq_save(flags);
mmgrab_lazy_tlb(&init_mm);
current->active_mm = &init_mm;
switch_mm_irqs_off(mm, &init_mm, current);
mmdrop_lazy_tlb(mm);
+ local_irq_restore(flags);
}
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 0108a4e9f3584a7a2c026d1601b0682ff7335d95
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062318-urethane-vanquish-c710@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0108a4e9f3584a7a2c026d1601b0682ff7335d95 Mon Sep 17 00:00:00 2001
From: Krister Johansen <kjlx(a)templeofstupid.com>
Date: Mon, 12 Jun 2023 17:44:40 -0700
Subject: [PATCH] bpf: ensure main program has an extable
When subprograms are in use, the main program is not jit'd after the
subprograms because jit_subprogs sets a value for prog->bpf_func upon
success. Subsequent calls to the JIT are bypassed when this value is
non-NULL. This leads to a situation where the main program and its
func[0] counterpart are both in the bpf kallsyms tree, but only func[0]
has an extable. Extables are only created during JIT. Now there are
two nearly identical program ksym entries in the tree, but only one has
an extable. Depending upon how the entries are placed, there's a chance
that a fault will call search_extable on the aux with the NULL entry.
Since jit_subprogs already copies state from func[0] to the main
program, include the extable pointer in this state duplication.
Additionally, ensure that the copy of the main program in func[0] is not
added to the bpf_prog_kallsyms table. Instead, let the main program get
added later in bpf_prog_load(). This ensures there is only a single
copy of the main program in the kallsyms table, and that its tag matches
the tag observed by tooling like bpftool.
Cc: stable(a)vger.kernel.org
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Krister Johansen <kjlx(a)templeofstupid.com>
Acked-by: Yonghong Song <yhs(a)fb.com>
Acked-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Link: https://lore.kernel.org/r/6de9b2f4b4724ef56efbb0339daaa66c8b68b1e7.16866166…
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0dd8adc7a159..cf5f230360f5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17217,9 +17217,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
}
/* finally lock prog and jit images for all functions and
- * populate kallsysm
+ * populate kallsysm. Begin at the first subprogram, since
+ * bpf_prog_load will add the kallsyms for the main program.
*/
- for (i = 0; i < env->subprog_cnt; i++) {
+ for (i = 1; i < env->subprog_cnt; i++) {
bpf_prog_lock_ro(func[i]);
bpf_prog_kallsyms_add(func[i]);
}
@@ -17245,6 +17246,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
prog->jited = 1;
prog->bpf_func = func[0]->bpf_func;
prog->jited_len = func[0]->jited_len;
+ prog->aux->extable = func[0]->aux->extable;
+ prog->aux->num_exentries = func[0]->aux->num_exentries;
prog->aux->func = func;
prog->aux->func_cnt = env->subprog_cnt;
bpf_prog_jit_attempt_done(prog);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 0108a4e9f3584a7a2c026d1601b0682ff7335d95
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062306-unretired-rockband-a7d9@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0108a4e9f3584a7a2c026d1601b0682ff7335d95 Mon Sep 17 00:00:00 2001
From: Krister Johansen <kjlx(a)templeofstupid.com>
Date: Mon, 12 Jun 2023 17:44:40 -0700
Subject: [PATCH] bpf: ensure main program has an extable
When subprograms are in use, the main program is not jit'd after the
subprograms because jit_subprogs sets a value for prog->bpf_func upon
success. Subsequent calls to the JIT are bypassed when this value is
non-NULL. This leads to a situation where the main program and its
func[0] counterpart are both in the bpf kallsyms tree, but only func[0]
has an extable. Extables are only created during JIT. Now there are
two nearly identical program ksym entries in the tree, but only one has
an extable. Depending upon how the entries are placed, there's a chance
that a fault will call search_extable on the aux with the NULL entry.
Since jit_subprogs already copies state from func[0] to the main
program, include the extable pointer in this state duplication.
Additionally, ensure that the copy of the main program in func[0] is not
added to the bpf_prog_kallsyms table. Instead, let the main program get
added later in bpf_prog_load(). This ensures there is only a single
copy of the main program in the kallsyms table, and that its tag matches
the tag observed by tooling like bpftool.
Cc: stable(a)vger.kernel.org
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Krister Johansen <kjlx(a)templeofstupid.com>
Acked-by: Yonghong Song <yhs(a)fb.com>
Acked-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Link: https://lore.kernel.org/r/6de9b2f4b4724ef56efbb0339daaa66c8b68b1e7.16866166…
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0dd8adc7a159..cf5f230360f5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17217,9 +17217,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
}
/* finally lock prog and jit images for all functions and
- * populate kallsysm
+ * populate kallsysm. Begin at the first subprogram, since
+ * bpf_prog_load will add the kallsyms for the main program.
*/
- for (i = 0; i < env->subprog_cnt; i++) {
+ for (i = 1; i < env->subprog_cnt; i++) {
bpf_prog_lock_ro(func[i]);
bpf_prog_kallsyms_add(func[i]);
}
@@ -17245,6 +17246,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
prog->jited = 1;
prog->bpf_func = func[0]->bpf_func;
prog->jited_len = func[0]->jited_len;
+ prog->aux->extable = func[0]->aux->extable;
+ prog->aux->num_exentries = func[0]->aux->num_exentries;
prog->aux->func = func;
prog->aux->func_cnt = env->subprog_cnt;
bpf_prog_jit_attempt_done(prog);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0108a4e9f3584a7a2c026d1601b0682ff7335d95
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062353-unicycle-fang-a423@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0108a4e9f3584a7a2c026d1601b0682ff7335d95 Mon Sep 17 00:00:00 2001
From: Krister Johansen <kjlx(a)templeofstupid.com>
Date: Mon, 12 Jun 2023 17:44:40 -0700
Subject: [PATCH] bpf: ensure main program has an extable
When subprograms are in use, the main program is not jit'd after the
subprograms because jit_subprogs sets a value for prog->bpf_func upon
success. Subsequent calls to the JIT are bypassed when this value is
non-NULL. This leads to a situation where the main program and its
func[0] counterpart are both in the bpf kallsyms tree, but only func[0]
has an extable. Extables are only created during JIT. Now there are
two nearly identical program ksym entries in the tree, but only one has
an extable. Depending upon how the entries are placed, there's a chance
that a fault will call search_extable on the aux with the NULL entry.
Since jit_subprogs already copies state from func[0] to the main
program, include the extable pointer in this state duplication.
Additionally, ensure that the copy of the main program in func[0] is not
added to the bpf_prog_kallsyms table. Instead, let the main program get
added later in bpf_prog_load(). This ensures there is only a single
copy of the main program in the kallsyms table, and that its tag matches
the tag observed by tooling like bpftool.
Cc: stable(a)vger.kernel.org
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Krister Johansen <kjlx(a)templeofstupid.com>
Acked-by: Yonghong Song <yhs(a)fb.com>
Acked-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Link: https://lore.kernel.org/r/6de9b2f4b4724ef56efbb0339daaa66c8b68b1e7.16866166…
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0dd8adc7a159..cf5f230360f5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17217,9 +17217,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
}
/* finally lock prog and jit images for all functions and
- * populate kallsysm
+ * populate kallsysm. Begin at the first subprogram, since
+ * bpf_prog_load will add the kallsyms for the main program.
*/
- for (i = 0; i < env->subprog_cnt; i++) {
+ for (i = 1; i < env->subprog_cnt; i++) {
bpf_prog_lock_ro(func[i]);
bpf_prog_kallsyms_add(func[i]);
}
@@ -17245,6 +17246,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
prog->jited = 1;
prog->bpf_func = func[0]->bpf_func;
prog->jited_len = func[0]->jited_len;
+ prog->aux->extable = func[0]->aux->extable;
+ prog->aux->num_exentries = func[0]->aux->num_exentries;
prog->aux->func = func;
prog->aux->func_cnt = env->subprog_cnt;
bpf_prog_jit_attempt_done(prog);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 78d0d2063bab954d19a1696feae4c7706a626d48
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062303-urging-acutely-c0f9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 78d0d2063bab954d19a1696feae4c7706a626d48 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 19 Jun 2023 09:41:05 -0600
Subject: [PATCH] io_uring/net: disable partial retries for recvmsg with cmsg
We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.
Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.
Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kerne…
Cc: stable(a)vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze(a)samba.org>
Reviewed-by: Stefan Metzmacher <metze(a)samba.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/net.c b/io_uring/net.c
index c0924ab1ea11..2bc2cb2f4d6c 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -789,16 +789,19 @@ retry_multishot:
flags = sr->msg_flags;
if (force_nonblock)
flags |= MSG_DONTWAIT;
- if (flags & MSG_WAITALL)
- min_ret = iov_iter_count(&kmsg->msg.msg_iter);
kmsg->msg.msg_get_inq = 1;
- if (req->flags & REQ_F_APOLL_MULTISHOT)
+ if (req->flags & REQ_F_APOLL_MULTISHOT) {
ret = io_recvmsg_multishot(sock, sr, kmsg, flags,
&mshot_finished);
- else
+ } else {
+ /* disable partial retry for recvmsg with cmsg attached */
+ if (flags & MSG_WAITALL && !kmsg->msg.msg_controllen)
+ min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg,
kmsg->uaddr, flags);
+ }
if (ret < min_ret) {
if (ret == -EAGAIN && force_nonblock) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 78d0d2063bab954d19a1696feae4c7706a626d48
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062302-supremacy-backboned-b263@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 78d0d2063bab954d19a1696feae4c7706a626d48 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 19 Jun 2023 09:41:05 -0600
Subject: [PATCH] io_uring/net: disable partial retries for recvmsg with cmsg
We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.
Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.
Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kerne…
Cc: stable(a)vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze(a)samba.org>
Reviewed-by: Stefan Metzmacher <metze(a)samba.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/net.c b/io_uring/net.c
index c0924ab1ea11..2bc2cb2f4d6c 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -789,16 +789,19 @@ retry_multishot:
flags = sr->msg_flags;
if (force_nonblock)
flags |= MSG_DONTWAIT;
- if (flags & MSG_WAITALL)
- min_ret = iov_iter_count(&kmsg->msg.msg_iter);
kmsg->msg.msg_get_inq = 1;
- if (req->flags & REQ_F_APOLL_MULTISHOT)
+ if (req->flags & REQ_F_APOLL_MULTISHOT) {
ret = io_recvmsg_multishot(sock, sr, kmsg, flags,
&mshot_finished);
- else
+ } else {
+ /* disable partial retry for recvmsg with cmsg attached */
+ if (flags & MSG_WAITALL && !kmsg->msg.msg_controllen)
+ min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg,
kmsg->uaddr, flags);
+ }
if (ret < min_ret) {
if (ret == -EAGAIN && force_nonblock) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x b1dc492087db0f2e5a45f1072a743d04618dd6be
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062329-stem-resize-ab95@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b1dc492087db0f2e5a45f1072a743d04618dd6be Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 19 Jun 2023 09:35:34 -0600
Subject: [PATCH] io_uring/net: clear msg_controllen on partial sendmsg retry
If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.
Cc: stable(a)vger.kernel.org # 5.10+
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze(a)samba.org>
Reviewed-by: Stefan Metzmacher <metze(a)samba.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/net.c b/io_uring/net.c
index 51b0f7fbb4f5..c0924ab1ea11 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -326,6 +326,8 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
return io_setup_async_msg(req, kmsg, issue_flags);
if (ret > 0 && io_net_retry(sock, flags)) {
+ kmsg->msg.msg_controllen = 0;
+ kmsg->msg.msg_control = NULL;
sr->done_io += ret;
req->flags |= REQ_F_PARTIAL_IO;
return io_setup_async_msg(req, kmsg, issue_flags);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x b1dc492087db0f2e5a45f1072a743d04618dd6be
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062328-parka-debtor-1040@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b1dc492087db0f2e5a45f1072a743d04618dd6be Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 19 Jun 2023 09:35:34 -0600
Subject: [PATCH] io_uring/net: clear msg_controllen on partial sendmsg retry
If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.
Cc: stable(a)vger.kernel.org # 5.10+
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze(a)samba.org>
Reviewed-by: Stefan Metzmacher <metze(a)samba.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/net.c b/io_uring/net.c
index 51b0f7fbb4f5..c0924ab1ea11 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -326,6 +326,8 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
return io_setup_async_msg(req, kmsg, issue_flags);
if (ret > 0 && io_net_retry(sock, flags)) {
+ kmsg->msg.msg_controllen = 0;
+ kmsg->msg.msg_control = NULL;
sr->done_io += ret;
req->flags |= REQ_F_PARTIAL_IO;
return io_setup_async_msg(req, kmsg, issue_flags);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 067d6ec7ed5b49380688e06c1e5f883a71bef4fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062352-unrivaled-feline-e1a2@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 067d6ec7ed5b49380688e06c1e5f883a71bef4fe Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:51 -0700
Subject: [PATCH] PCI: hv: Add a per-bus mutex state_lock
In the case of fast device addition/removal, it's possible that
hv_eject_device_work() can start to run before create_root_hv_pci_bus()
starts to run; as a result, the pci_get_domain_bus_and_slot() in
hv_eject_device_work() can return a 'pdev' of NULL, and
hv_eject_device_work() can remove the 'hpdev', and immediately send a
message PCI_EJECTION_COMPLETE to the host, and the host immediately
unassigns the PCI device from the guest; meanwhile,
create_root_hv_pci_bus() and the PCI device driver can be probing the
dead PCI device and reporting timeout errors.
Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
mutex before powering on the PCI bus in hv_pci_enter_d0(): when
hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
loaded, the PCI device driver's probe() function is already called in
create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
to call the PCI device driver's remove() function and remove the device
reliably; if the PCI device driver hasn't loaded yet, the function call
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
remove the PCI device reliably and the PCI device driver's probe()
function won't be called; if the PCI device driver's probe() is already
running (e.g., systemd-udev is loading the PCI device driver), it must
be holding the per-device lock, and after the probe() finishes and releases
the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
able to proceed to remove the device reliably.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 1a5296fad1c4..2d93d0c4f10d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -489,7 +489,10 @@ struct hv_pcibus_device {
struct fwnode_handle *fwnode;
/* Protocol version negotiated with the host */
enum pci_protocol_version_t protocol_version;
+
+ struct mutex state_lock;
enum hv_pcibus_state state;
+
struct hv_device *hdev;
resource_size_t low_mmio_space;
resource_size_t high_mmio_space;
@@ -2605,6 +2608,8 @@ static void pci_devices_present_work(struct work_struct *work)
if (!dr)
return;
+ mutex_lock(&hbus->state_lock);
+
/* First, mark all existing children as reported missing. */
spin_lock_irqsave(&hbus->device_list_lock, flags);
list_for_each_entry(hpdev, &hbus->children, list_entry) {
@@ -2686,6 +2691,8 @@ static void pci_devices_present_work(struct work_struct *work)
break;
}
+ mutex_unlock(&hbus->state_lock);
+
kfree(dr);
}
@@ -2834,6 +2841,8 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
+ mutex_lock(&hbus->state_lock);
+
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2870,6 +2879,8 @@ static void hv_eject_device_work(struct work_struct *work)
put_pcichild(hpdev);
put_pcichild(hpdev);
/* hpdev has been freed. Do not use it any more. */
+
+ mutex_unlock(&hbus->state_lock);
}
/**
@@ -3636,6 +3647,7 @@ static int hv_pci_probe(struct hv_device *hdev,
return -ENOMEM;
hbus->bridge = bridge;
+ mutex_init(&hbus->state_lock);
hbus->state = hv_pcibus_init;
hbus->wslot_res_allocated = -1;
@@ -3745,9 +3757,11 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_irq_domain;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto free_irq_domain;
+ goto release_state_lock;
ret = hv_pci_allocate_bridge_windows(hbus);
if (ret)
@@ -3765,12 +3779,15 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_windows;
+ mutex_unlock(&hbus->state_lock);
return 0;
free_windows:
hv_pci_free_bridge_windows(hbus);
exit_d0:
(void) hv_pci_bus_exit(hdev, true);
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
free_irq_domain:
irq_domain_remove(hbus->irq_domain);
free_fwnode:
@@ -4020,20 +4037,26 @@ static int hv_pci_resume(struct hv_device *hdev)
if (ret)
goto out;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
ret = hv_send_resources_allocated(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
prepopulate_bars(hbus);
hv_pci_restore_msi_state(hbus);
hbus->state = hv_pcibus_installed;
+ mutex_unlock(&hbus->state_lock);
return 0;
+
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
out:
vmbus_close(hdev->channel);
return ret;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 067d6ec7ed5b49380688e06c1e5f883a71bef4fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062350-companion-debtless-8e8e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 067d6ec7ed5b49380688e06c1e5f883a71bef4fe Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:51 -0700
Subject: [PATCH] PCI: hv: Add a per-bus mutex state_lock
In the case of fast device addition/removal, it's possible that
hv_eject_device_work() can start to run before create_root_hv_pci_bus()
starts to run; as a result, the pci_get_domain_bus_and_slot() in
hv_eject_device_work() can return a 'pdev' of NULL, and
hv_eject_device_work() can remove the 'hpdev', and immediately send a
message PCI_EJECTION_COMPLETE to the host, and the host immediately
unassigns the PCI device from the guest; meanwhile,
create_root_hv_pci_bus() and the PCI device driver can be probing the
dead PCI device and reporting timeout errors.
Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
mutex before powering on the PCI bus in hv_pci_enter_d0(): when
hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
loaded, the PCI device driver's probe() function is already called in
create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
to call the PCI device driver's remove() function and remove the device
reliably; if the PCI device driver hasn't loaded yet, the function call
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
remove the PCI device reliably and the PCI device driver's probe()
function won't be called; if the PCI device driver's probe() is already
running (e.g., systemd-udev is loading the PCI device driver), it must
be holding the per-device lock, and after the probe() finishes and releases
the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
able to proceed to remove the device reliably.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 1a5296fad1c4..2d93d0c4f10d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -489,7 +489,10 @@ struct hv_pcibus_device {
struct fwnode_handle *fwnode;
/* Protocol version negotiated with the host */
enum pci_protocol_version_t protocol_version;
+
+ struct mutex state_lock;
enum hv_pcibus_state state;
+
struct hv_device *hdev;
resource_size_t low_mmio_space;
resource_size_t high_mmio_space;
@@ -2605,6 +2608,8 @@ static void pci_devices_present_work(struct work_struct *work)
if (!dr)
return;
+ mutex_lock(&hbus->state_lock);
+
/* First, mark all existing children as reported missing. */
spin_lock_irqsave(&hbus->device_list_lock, flags);
list_for_each_entry(hpdev, &hbus->children, list_entry) {
@@ -2686,6 +2691,8 @@ static void pci_devices_present_work(struct work_struct *work)
break;
}
+ mutex_unlock(&hbus->state_lock);
+
kfree(dr);
}
@@ -2834,6 +2841,8 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
+ mutex_lock(&hbus->state_lock);
+
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2870,6 +2879,8 @@ static void hv_eject_device_work(struct work_struct *work)
put_pcichild(hpdev);
put_pcichild(hpdev);
/* hpdev has been freed. Do not use it any more. */
+
+ mutex_unlock(&hbus->state_lock);
}
/**
@@ -3636,6 +3647,7 @@ static int hv_pci_probe(struct hv_device *hdev,
return -ENOMEM;
hbus->bridge = bridge;
+ mutex_init(&hbus->state_lock);
hbus->state = hv_pcibus_init;
hbus->wslot_res_allocated = -1;
@@ -3745,9 +3757,11 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_irq_domain;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto free_irq_domain;
+ goto release_state_lock;
ret = hv_pci_allocate_bridge_windows(hbus);
if (ret)
@@ -3765,12 +3779,15 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_windows;
+ mutex_unlock(&hbus->state_lock);
return 0;
free_windows:
hv_pci_free_bridge_windows(hbus);
exit_d0:
(void) hv_pci_bus_exit(hdev, true);
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
free_irq_domain:
irq_domain_remove(hbus->irq_domain);
free_fwnode:
@@ -4020,20 +4037,26 @@ static int hv_pci_resume(struct hv_device *hdev)
if (ret)
goto out;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
ret = hv_send_resources_allocated(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
prepopulate_bars(hbus);
hv_pci_restore_msi_state(hbus);
hbus->state = hv_pcibus_installed;
+ mutex_unlock(&hbus->state_lock);
return 0;
+
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
out:
vmbus_close(hdev->channel);
return ret;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 067d6ec7ed5b49380688e06c1e5f883a71bef4fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062348-spout-suitcase-a356@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 067d6ec7ed5b49380688e06c1e5f883a71bef4fe Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:51 -0700
Subject: [PATCH] PCI: hv: Add a per-bus mutex state_lock
In the case of fast device addition/removal, it's possible that
hv_eject_device_work() can start to run before create_root_hv_pci_bus()
starts to run; as a result, the pci_get_domain_bus_and_slot() in
hv_eject_device_work() can return a 'pdev' of NULL, and
hv_eject_device_work() can remove the 'hpdev', and immediately send a
message PCI_EJECTION_COMPLETE to the host, and the host immediately
unassigns the PCI device from the guest; meanwhile,
create_root_hv_pci_bus() and the PCI device driver can be probing the
dead PCI device and reporting timeout errors.
Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
mutex before powering on the PCI bus in hv_pci_enter_d0(): when
hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
loaded, the PCI device driver's probe() function is already called in
create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
to call the PCI device driver's remove() function and remove the device
reliably; if the PCI device driver hasn't loaded yet, the function call
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
remove the PCI device reliably and the PCI device driver's probe()
function won't be called; if the PCI device driver's probe() is already
running (e.g., systemd-udev is loading the PCI device driver), it must
be holding the per-device lock, and after the probe() finishes and releases
the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
able to proceed to remove the device reliably.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 1a5296fad1c4..2d93d0c4f10d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -489,7 +489,10 @@ struct hv_pcibus_device {
struct fwnode_handle *fwnode;
/* Protocol version negotiated with the host */
enum pci_protocol_version_t protocol_version;
+
+ struct mutex state_lock;
enum hv_pcibus_state state;
+
struct hv_device *hdev;
resource_size_t low_mmio_space;
resource_size_t high_mmio_space;
@@ -2605,6 +2608,8 @@ static void pci_devices_present_work(struct work_struct *work)
if (!dr)
return;
+ mutex_lock(&hbus->state_lock);
+
/* First, mark all existing children as reported missing. */
spin_lock_irqsave(&hbus->device_list_lock, flags);
list_for_each_entry(hpdev, &hbus->children, list_entry) {
@@ -2686,6 +2691,8 @@ static void pci_devices_present_work(struct work_struct *work)
break;
}
+ mutex_unlock(&hbus->state_lock);
+
kfree(dr);
}
@@ -2834,6 +2841,8 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
+ mutex_lock(&hbus->state_lock);
+
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2870,6 +2879,8 @@ static void hv_eject_device_work(struct work_struct *work)
put_pcichild(hpdev);
put_pcichild(hpdev);
/* hpdev has been freed. Do not use it any more. */
+
+ mutex_unlock(&hbus->state_lock);
}
/**
@@ -3636,6 +3647,7 @@ static int hv_pci_probe(struct hv_device *hdev,
return -ENOMEM;
hbus->bridge = bridge;
+ mutex_init(&hbus->state_lock);
hbus->state = hv_pcibus_init;
hbus->wslot_res_allocated = -1;
@@ -3745,9 +3757,11 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_irq_domain;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto free_irq_domain;
+ goto release_state_lock;
ret = hv_pci_allocate_bridge_windows(hbus);
if (ret)
@@ -3765,12 +3779,15 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_windows;
+ mutex_unlock(&hbus->state_lock);
return 0;
free_windows:
hv_pci_free_bridge_windows(hbus);
exit_d0:
(void) hv_pci_bus_exit(hdev, true);
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
free_irq_domain:
irq_domain_remove(hbus->irq_domain);
free_fwnode:
@@ -4020,20 +4037,26 @@ static int hv_pci_resume(struct hv_device *hdev)
if (ret)
goto out;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
ret = hv_send_resources_allocated(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
prepopulate_bars(hbus);
hv_pci_restore_msi_state(hbus);
hbus->state = hv_pcibus_installed;
+ mutex_unlock(&hbus->state_lock);
return 0;
+
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
out:
vmbus_close(hdev->channel);
return ret;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 067d6ec7ed5b49380688e06c1e5f883a71bef4fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062346-consuming-staunch-e0a5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 067d6ec7ed5b49380688e06c1e5f883a71bef4fe Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:51 -0700
Subject: [PATCH] PCI: hv: Add a per-bus mutex state_lock
In the case of fast device addition/removal, it's possible that
hv_eject_device_work() can start to run before create_root_hv_pci_bus()
starts to run; as a result, the pci_get_domain_bus_and_slot() in
hv_eject_device_work() can return a 'pdev' of NULL, and
hv_eject_device_work() can remove the 'hpdev', and immediately send a
message PCI_EJECTION_COMPLETE to the host, and the host immediately
unassigns the PCI device from the guest; meanwhile,
create_root_hv_pci_bus() and the PCI device driver can be probing the
dead PCI device and reporting timeout errors.
Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
mutex before powering on the PCI bus in hv_pci_enter_d0(): when
hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
loaded, the PCI device driver's probe() function is already called in
create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
to call the PCI device driver's remove() function and remove the device
reliably; if the PCI device driver hasn't loaded yet, the function call
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
remove the PCI device reliably and the PCI device driver's probe()
function won't be called; if the PCI device driver's probe() is already
running (e.g., systemd-udev is loading the PCI device driver), it must
be holding the per-device lock, and after the probe() finishes and releases
the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
able to proceed to remove the device reliably.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 1a5296fad1c4..2d93d0c4f10d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -489,7 +489,10 @@ struct hv_pcibus_device {
struct fwnode_handle *fwnode;
/* Protocol version negotiated with the host */
enum pci_protocol_version_t protocol_version;
+
+ struct mutex state_lock;
enum hv_pcibus_state state;
+
struct hv_device *hdev;
resource_size_t low_mmio_space;
resource_size_t high_mmio_space;
@@ -2605,6 +2608,8 @@ static void pci_devices_present_work(struct work_struct *work)
if (!dr)
return;
+ mutex_lock(&hbus->state_lock);
+
/* First, mark all existing children as reported missing. */
spin_lock_irqsave(&hbus->device_list_lock, flags);
list_for_each_entry(hpdev, &hbus->children, list_entry) {
@@ -2686,6 +2691,8 @@ static void pci_devices_present_work(struct work_struct *work)
break;
}
+ mutex_unlock(&hbus->state_lock);
+
kfree(dr);
}
@@ -2834,6 +2841,8 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
+ mutex_lock(&hbus->state_lock);
+
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2870,6 +2879,8 @@ static void hv_eject_device_work(struct work_struct *work)
put_pcichild(hpdev);
put_pcichild(hpdev);
/* hpdev has been freed. Do not use it any more. */
+
+ mutex_unlock(&hbus->state_lock);
}
/**
@@ -3636,6 +3647,7 @@ static int hv_pci_probe(struct hv_device *hdev,
return -ENOMEM;
hbus->bridge = bridge;
+ mutex_init(&hbus->state_lock);
hbus->state = hv_pcibus_init;
hbus->wslot_res_allocated = -1;
@@ -3745,9 +3757,11 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_irq_domain;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto free_irq_domain;
+ goto release_state_lock;
ret = hv_pci_allocate_bridge_windows(hbus);
if (ret)
@@ -3765,12 +3779,15 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_windows;
+ mutex_unlock(&hbus->state_lock);
return 0;
free_windows:
hv_pci_free_bridge_windows(hbus);
exit_d0:
(void) hv_pci_bus_exit(hdev, true);
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
free_irq_domain:
irq_domain_remove(hbus->irq_domain);
free_fwnode:
@@ -4020,20 +4037,26 @@ static int hv_pci_resume(struct hv_device *hdev)
if (ret)
goto out;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
ret = hv_send_resources_allocated(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
prepopulate_bars(hbus);
hv_pci_restore_msi_state(hbus);
hbus->state = hv_pcibus_installed;
+ mutex_unlock(&hbus->state_lock);
return 0;
+
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
out:
vmbus_close(hdev->channel);
return ret;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 2738d5ab7929a845b654cd171a1e275c37eb428e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062330-shucking-avenue-4517@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2738d5ab7929a845b654cd171a1e275c37eb428e Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:48 -0700
Subject: [PATCH] PCI: hv: Fix a race condition in hv_irq_unmask() that can
cause panic
When the host tries to remove a PCI device, the host first sends a
PCI_EJECT message to the guest, and the guest is supposed to gracefully
remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
the guest (when the guest receives this message, the device is already
unassigned from the guest) and the guest can do some final cleanup work;
if the guest fails to respond to the PCI_EJECT message within one minute,
the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
removes the PCI device forcibly.
In the case of fast device addition/removal, it's possible that the PCI
device driver is still configuring MSI-X interrupts when the guest receives
the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
hv_eject_device_work(); if the PCI device driver is calling
pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
leave data->chip_data with its default value of NULL; later, when the PCI
device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
Fix the issue by not testing hpdev->state in the while loop: when the
guest receives PCI_EJECT, the device is still assigned to the guest, and
the guest has one minute to finish the device removal gracefully. We don't
really need to (and we should not) test hpdev->state in the loop.
Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-3-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index ea8862e656b6..733637d96765 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -635,6 +635,11 @@ static void hv_arch_irq_unmask(struct irq_data *data)
pbus = pdev->bus;
hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
int_desc = data->chip_data;
+ if (!int_desc) {
+ dev_warn(&hbus->hdev->device, "%s() can not unmask irq %u\n",
+ __func__, data->irq);
+ return;
+ }
local_irq_save(flags);
@@ -2004,12 +2009,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
hv_pci_onchannelcallback(hbus);
spin_unlock_irqrestore(&channel->sched_lock, flags);
- if (hpdev->state == hv_pcichild_ejecting) {
- dev_err_once(&hbus->hdev->device,
- "the device is being ejected\n");
- goto enable_tasklet;
- }
-
udelay(100);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2738d5ab7929a845b654cd171a1e275c37eb428e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062328-harmonics-nastily-a8fd@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2738d5ab7929a845b654cd171a1e275c37eb428e Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:48 -0700
Subject: [PATCH] PCI: hv: Fix a race condition in hv_irq_unmask() that can
cause panic
When the host tries to remove a PCI device, the host first sends a
PCI_EJECT message to the guest, and the guest is supposed to gracefully
remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
the guest (when the guest receives this message, the device is already
unassigned from the guest) and the guest can do some final cleanup work;
if the guest fails to respond to the PCI_EJECT message within one minute,
the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
removes the PCI device forcibly.
In the case of fast device addition/removal, it's possible that the PCI
device driver is still configuring MSI-X interrupts when the guest receives
the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
hv_eject_device_work(); if the PCI device driver is calling
pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
leave data->chip_data with its default value of NULL; later, when the PCI
device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
Fix the issue by not testing hpdev->state in the while loop: when the
guest receives PCI_EJECT, the device is still assigned to the guest, and
the guest has one minute to finish the device removal gracefully. We don't
really need to (and we should not) test hpdev->state in the loop.
Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-3-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index ea8862e656b6..733637d96765 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -635,6 +635,11 @@ static void hv_arch_irq_unmask(struct irq_data *data)
pbus = pdev->bus;
hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
int_desc = data->chip_data;
+ if (!int_desc) {
+ dev_warn(&hbus->hdev->device, "%s() can not unmask irq %u\n",
+ __func__, data->irq);
+ return;
+ }
local_irq_save(flags);
@@ -2004,12 +2009,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
hv_pci_onchannelcallback(hbus);
spin_unlock_irqrestore(&channel->sched_lock, flags);
- if (hpdev->state == hv_pcichild_ejecting) {
- dev_err_once(&hbus->hdev->device,
- "the device is being ejected\n");
- goto enable_tasklet;
- }
-
udelay(100);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2738d5ab7929a845b654cd171a1e275c37eb428e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062326-darkish-unblended-ea7f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2738d5ab7929a845b654cd171a1e275c37eb428e Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:48 -0700
Subject: [PATCH] PCI: hv: Fix a race condition in hv_irq_unmask() that can
cause panic
When the host tries to remove a PCI device, the host first sends a
PCI_EJECT message to the guest, and the guest is supposed to gracefully
remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
the guest (when the guest receives this message, the device is already
unassigned from the guest) and the guest can do some final cleanup work;
if the guest fails to respond to the PCI_EJECT message within one minute,
the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
removes the PCI device forcibly.
In the case of fast device addition/removal, it's possible that the PCI
device driver is still configuring MSI-X interrupts when the guest receives
the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
hv_eject_device_work(); if the PCI device driver is calling
pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
leave data->chip_data with its default value of NULL; later, when the PCI
device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
Fix the issue by not testing hpdev->state in the while loop: when the
guest receives PCI_EJECT, the device is still assigned to the guest, and
the guest has one minute to finish the device removal gracefully. We don't
really need to (and we should not) test hpdev->state in the loop.
Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-3-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index ea8862e656b6..733637d96765 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -635,6 +635,11 @@ static void hv_arch_irq_unmask(struct irq_data *data)
pbus = pdev->bus;
hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
int_desc = data->chip_data;
+ if (!int_desc) {
+ dev_warn(&hbus->hdev->device, "%s() can not unmask irq %u\n",
+ __func__, data->irq);
+ return;
+ }
local_irq_save(flags);
@@ -2004,12 +2009,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
hv_pci_onchannelcallback(hbus);
spin_unlock_irqrestore(&channel->sched_lock, flags);
- if (hpdev->state == hv_pcichild_ejecting) {
- dev_err_once(&hbus->hdev->device,
- "the device is being ejected\n");
- goto enable_tasklet;
- }
-
udelay(100);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x add9195e69c94b32e96f78c2f9cea68f0e850b3f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062307-reversion-wasp-4205@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From add9195e69c94b32e96f78c2f9cea68f0e850b3f Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:49 -0700
Subject: [PATCH] PCI: hv: Remove the useless hv_pcichild_state from struct
hv_pci_dev
The hpdev->state is never really useful. The only use in
hv_pci_eject_device() and hv_eject_device_work() is not really necessary.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-4-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 733637d96765..a826b41c949a 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -545,19 +545,10 @@ struct hv_dr_state {
struct hv_pcidev_description func[];
};
-enum hv_pcichild_state {
- hv_pcichild_init = 0,
- hv_pcichild_requirements,
- hv_pcichild_resourced,
- hv_pcichild_ejecting,
- hv_pcichild_maximum
-};
-
struct hv_pci_dev {
/* List protected by pci_rescan_remove_lock */
struct list_head list_entry;
refcount_t refs;
- enum hv_pcichild_state state;
struct pci_slot *pci_slot;
struct hv_pcidev_description desc;
bool reported_missing;
@@ -2843,8 +2834,6 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
- WARN_ON(hpdev->state != hv_pcichild_ejecting);
-
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2901,7 +2890,6 @@ static void hv_pci_eject_device(struct hv_pci_dev *hpdev)
return;
}
- hpdev->state = hv_pcichild_ejecting;
get_pcichild(hpdev);
INIT_WORK(&hpdev->wrk, hv_eject_device_work);
queue_work(hbus->wq, &hpdev->wrk);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x add9195e69c94b32e96f78c2f9cea68f0e850b3f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062306-sphinx-acid-52ac@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From add9195e69c94b32e96f78c2f9cea68f0e850b3f Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:49 -0700
Subject: [PATCH] PCI: hv: Remove the useless hv_pcichild_state from struct
hv_pci_dev
The hpdev->state is never really useful. The only use in
hv_pci_eject_device() and hv_eject_device_work() is not really necessary.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-4-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 733637d96765..a826b41c949a 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -545,19 +545,10 @@ struct hv_dr_state {
struct hv_pcidev_description func[];
};
-enum hv_pcichild_state {
- hv_pcichild_init = 0,
- hv_pcichild_requirements,
- hv_pcichild_resourced,
- hv_pcichild_ejecting,
- hv_pcichild_maximum
-};
-
struct hv_pci_dev {
/* List protected by pci_rescan_remove_lock */
struct list_head list_entry;
refcount_t refs;
- enum hv_pcichild_state state;
struct pci_slot *pci_slot;
struct hv_pcidev_description desc;
bool reported_missing;
@@ -2843,8 +2834,6 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
- WARN_ON(hpdev->state != hv_pcichild_ejecting);
-
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2901,7 +2890,6 @@ static void hv_pci_eject_device(struct hv_pci_dev *hpdev)
return;
}
- hpdev->state = hv_pcichild_ejecting;
get_pcichild(hpdev);
INIT_WORK(&hpdev->wrk, hv_eject_device_work);
queue_work(hbus->wq, &hpdev->wrk);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x add9195e69c94b32e96f78c2f9cea68f0e850b3f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062305-bogus-footing-aa7f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From add9195e69c94b32e96f78c2f9cea68f0e850b3f Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:49 -0700
Subject: [PATCH] PCI: hv: Remove the useless hv_pcichild_state from struct
hv_pci_dev
The hpdev->state is never really useful. The only use in
hv_pci_eject_device() and hv_eject_device_work() is not really necessary.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-4-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 733637d96765..a826b41c949a 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -545,19 +545,10 @@ struct hv_dr_state {
struct hv_pcidev_description func[];
};
-enum hv_pcichild_state {
- hv_pcichild_init = 0,
- hv_pcichild_requirements,
- hv_pcichild_resourced,
- hv_pcichild_ejecting,
- hv_pcichild_maximum
-};
-
struct hv_pci_dev {
/* List protected by pci_rescan_remove_lock */
struct list_head list_entry;
refcount_t refs;
- enum hv_pcichild_state state;
struct pci_slot *pci_slot;
struct hv_pcidev_description desc;
bool reported_missing;
@@ -2843,8 +2834,6 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
- WARN_ON(hpdev->state != hv_pcichild_ejecting);
-
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2901,7 +2890,6 @@ static void hv_pci_eject_device(struct hv_pci_dev *hpdev)
return;
}
- hpdev->state = hv_pcichild_ejecting;
get_pcichild(hpdev);
INIT_WORK(&hpdev->wrk, hv_eject_device_work);
queue_work(hbus->wq, &hpdev->wrk);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 440b5e3663271b0ffbd4908115044a6a51fb938b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062321-recreate-donation-85c8@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 440b5e3663271b0ffbd4908115044a6a51fb938b Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Wed, 14 Jun 2023 21:44:47 -0700
Subject: [PATCH] PCI: hv: Fix a race condition bug in hv_pci_query_relations()
Since day 1 of the driver, there has been a race between
hv_pci_query_relations() and survey_child_resources(): during fast
device hotplug, hv_pci_query_relations() may error out due to
device-remove and the stack variable 'comp' is no longer valid;
however, pci_devices_present_work() -> survey_child_resources() ->
complete() may be running on another CPU and accessing the no-longer-valid
'comp'. Fix the race by flushing the workqueue before we exit from
hv_pci_query_relations().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-2-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index bc32662c6bb7..ea8862e656b6 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3401,6 +3401,24 @@ static int hv_pci_query_relations(struct hv_device *hdev)
if (!ret)
ret = wait_for_response(hdev, &comp);
+ /*
+ * In the case of fast device addition/removal, it's possible that
+ * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
+ * already got a PCI_BUS_RELATIONS* message from the host and the
+ * channel callback already scheduled a work to hbus->wq, which can be
+ * running pci_devices_present_work() -> survey_child_resources() ->
+ * complete(&hbus->survey_event), even after hv_pci_query_relations()
+ * exits and the stack variable 'comp' is no longer valid; as a result,
+ * a hang or a page fault may happen when the complete() calls
+ * raw_spin_lock_irqsave(). Flush hbus->wq before we exit from
+ * hv_pci_query_relations() to avoid the issues. Note: if 'ret' is
+ * -ENODEV, there can't be any more work item scheduled to hbus->wq
+ * after the flush_workqueue(): see vmbus_onoffer_rescind() ->
+ * vmbus_reset_channel_cb(), vmbus_rescind_cleanup() ->
+ * channel->rescind = true.
+ */
+ flush_workqueue(hbus->wq);
+
return ret;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ec97e112985c2581ee61854a4b74f080f6cdfc2c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062309-wilder-grading-8bc3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ec97e112985c2581ee61854a4b74f080f6cdfc2c Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Thu, 4 May 2023 15:41:55 -0700
Subject: [PATCH] Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc()
fails
Commit 572086325ce9 ("Drivers: hv: vmbus: Cleanup synic memory free path")
says "Any memory allocations that succeeded will be freed when the caller
cleans up by calling hv_synic_free()", but if the get_zeroed_page() in
hv_synic_alloc() fails, currently hv_synic_free() is not really called
in vmbus_bus_init(), consequently there will be a memory leak, e.g.
hv_context.hv_numa_map is not freed in the error path. Fix this by
updating the goto labels.
Cc: stable(a)kernel.org
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Link: https://lore.kernel.org/r/20230504224155.10484-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 1c65a6dfb9fa..67f95a29aeca 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1372,7 +1372,7 @@ static int vmbus_bus_init(void)
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
hv_synic_init, hv_synic_cleanup);
if (ret < 0)
- goto err_cpuhp;
+ goto err_alloc;
hyperv_cpuhp_online = ret;
ret = vmbus_connect();
@@ -1392,9 +1392,8 @@ static int vmbus_bus_init(void)
err_connect:
cpuhp_remove_state(hyperv_cpuhp_online);
-err_cpuhp:
- hv_synic_free();
err_alloc:
+ hv_synic_free();
if (vmbus_irq == -1) {
hv_remove_vmbus_handler();
} else {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 2230f9e1171a2e9731422a14d1bbc313c0b719d1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062318-spore-raider-70ae@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2230f9e1171a2e9731422a14d1bbc313c0b719d1 Mon Sep 17 00:00:00 2001
From: Gavin Shan <gshan(a)redhat.com>
Date: Thu, 15 Jun 2023 15:42:59 +1000
Subject: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable(a)vger.kernel.org # v3.9+
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oliver Upton <oliver.upton(a)linux.dev>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Message-Id: <20230615054259.14911-1-gshan(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 479802a892d4..65f94f592ff8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
return __kvm_handle_hva_range(kvm, &range);
}
+
+static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+ /*
+ * Skipping invalid memslots is correct if and only change_pte() is
+ * surrounded by invalidate_range_{start,end}(), which is currently
+ * guaranteed by the primary MMU. If that ever changes, KVM needs to
+ * unmap the memslot instead of skipping the memslot to ensure that KVM
+ * doesn't hold references to the old PFN.
+ */
+ WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
+
+ if (range->slot->flags & KVM_MEMSLOT_INVALID)
+ return false;
+
+ return kvm_set_spte_gfn(kvm, range);
+}
+
static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long address,
@@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
return;
- kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+ kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
}
void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2230f9e1171a2e9731422a14d1bbc313c0b719d1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062315-encode-penknife-1a99@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2230f9e1171a2e9731422a14d1bbc313c0b719d1 Mon Sep 17 00:00:00 2001
From: Gavin Shan <gshan(a)redhat.com>
Date: Thu, 15 Jun 2023 15:42:59 +1000
Subject: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable(a)vger.kernel.org # v3.9+
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oliver Upton <oliver.upton(a)linux.dev>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Message-Id: <20230615054259.14911-1-gshan(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 479802a892d4..65f94f592ff8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
return __kvm_handle_hva_range(kvm, &range);
}
+
+static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+ /*
+ * Skipping invalid memslots is correct if and only change_pte() is
+ * surrounded by invalidate_range_{start,end}(), which is currently
+ * guaranteed by the primary MMU. If that ever changes, KVM needs to
+ * unmap the memslot instead of skipping the memslot to ensure that KVM
+ * doesn't hold references to the old PFN.
+ */
+ WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
+
+ if (range->slot->flags & KVM_MEMSLOT_INVALID)
+ return false;
+
+ return kvm_set_spte_gfn(kvm, range);
+}
+
static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long address,
@@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
return;
- kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+ kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
}
void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2230f9e1171a2e9731422a14d1bbc313c0b719d1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062312-ranch-brought-bb16@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2230f9e1171a2e9731422a14d1bbc313c0b719d1 Mon Sep 17 00:00:00 2001
From: Gavin Shan <gshan(a)redhat.com>
Date: Thu, 15 Jun 2023 15:42:59 +1000
Subject: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable(a)vger.kernel.org # v3.9+
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oliver Upton <oliver.upton(a)linux.dev>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Message-Id: <20230615054259.14911-1-gshan(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 479802a892d4..65f94f592ff8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
return __kvm_handle_hva_range(kvm, &range);
}
+
+static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+ /*
+ * Skipping invalid memslots is correct if and only change_pte() is
+ * surrounded by invalidate_range_{start,end}(), which is currently
+ * guaranteed by the primary MMU. If that ever changes, KVM needs to
+ * unmap the memslot instead of skipping the memslot to ensure that KVM
+ * doesn't hold references to the old PFN.
+ */
+ WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
+
+ if (range->slot->flags & KVM_MEMSLOT_INVALID)
+ return false;
+
+ return kvm_set_spte_gfn(kvm, range);
+}
+
static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long address,
@@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
return;
- kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+ kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
}
void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2230f9e1171a2e9731422a14d1bbc313c0b719d1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062310-preppy-grief-4bfb@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2230f9e1171a2e9731422a14d1bbc313c0b719d1 Mon Sep 17 00:00:00 2001
From: Gavin Shan <gshan(a)redhat.com>
Date: Thu, 15 Jun 2023 15:42:59 +1000
Subject: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable(a)vger.kernel.org # v3.9+
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oliver Upton <oliver.upton(a)linux.dev>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Message-Id: <20230615054259.14911-1-gshan(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 479802a892d4..65f94f592ff8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
return __kvm_handle_hva_range(kvm, &range);
}
+
+static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+ /*
+ * Skipping invalid memslots is correct if and only change_pte() is
+ * surrounded by invalidate_range_{start,end}(), which is currently
+ * guaranteed by the primary MMU. If that ever changes, KVM needs to
+ * unmap the memslot instead of skipping the memslot to ensure that KVM
+ * doesn't hold references to the old PFN.
+ */
+ WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
+
+ if (range->slot->flags & KVM_MEMSLOT_INVALID)
+ return false;
+
+ return kvm_set_spte_gfn(kvm, range);
+}
+
static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long address,
@@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
return;
- kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+ kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
}
void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 782e53d0c14420858dbf0f8f797973c150d3b6d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062323-remarry-update-6cc4@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782e53d0c14420858dbf0f8f797973c150d3b6d7 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Mon, 12 Jun 2023 11:14:56 +0900
Subject: [PATCH] nilfs2: prevent general protection fault in
nilfs_clear_dirty_page()
In a syzbot stress test that deliberately causes file system errors on
nilfs2 with a corrupted disk image, it has been reported that
nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
cache and calling nilfs_clear_dirty_page() for each dirty page/folio
retrieved, the back reference from the argument page to "mapping" may have
been changed to NULL (and possibly others). It is necessary to check this
after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
after locking it in nilfs_clear_dirty_pages() if the back reference
"mapping" from the page/folio is different from the "mapping" that held
the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+53369d11851d8f26735c(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5cf30827f244..b4e54d079b7d 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
struct folio *folio = fbatch.folios[i];
folio_lock(folio);
- nilfs_clear_dirty_page(&folio->page, silent);
+
+ /*
+ * This folio may have been removed from the address
+ * space by truncation or invalidation when the lock
+ * was acquired. Skip processing in that case.
+ */
+ if (likely(folio->mapping == mapping))
+ nilfs_clear_dirty_page(&folio->page, silent);
+
folio_unlock(folio);
}
folio_batch_release(&fbatch);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 782e53d0c14420858dbf0f8f797973c150d3b6d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062321-coping-trivial-c7fd@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782e53d0c14420858dbf0f8f797973c150d3b6d7 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Mon, 12 Jun 2023 11:14:56 +0900
Subject: [PATCH] nilfs2: prevent general protection fault in
nilfs_clear_dirty_page()
In a syzbot stress test that deliberately causes file system errors on
nilfs2 with a corrupted disk image, it has been reported that
nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
cache and calling nilfs_clear_dirty_page() for each dirty page/folio
retrieved, the back reference from the argument page to "mapping" may have
been changed to NULL (and possibly others). It is necessary to check this
after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
after locking it in nilfs_clear_dirty_pages() if the back reference
"mapping" from the page/folio is different from the "mapping" that held
the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+53369d11851d8f26735c(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5cf30827f244..b4e54d079b7d 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
struct folio *folio = fbatch.folios[i];
folio_lock(folio);
- nilfs_clear_dirty_page(&folio->page, silent);
+
+ /*
+ * This folio may have been removed from the address
+ * space by truncation or invalidation when the lock
+ * was acquired. Skip processing in that case.
+ */
+ if (likely(folio->mapping == mapping))
+ nilfs_clear_dirty_page(&folio->page, silent);
+
folio_unlock(folio);
}
folio_batch_release(&fbatch);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 782e53d0c14420858dbf0f8f797973c150d3b6d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062320-transpire-snugly-31ad@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782e53d0c14420858dbf0f8f797973c150d3b6d7 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Mon, 12 Jun 2023 11:14:56 +0900
Subject: [PATCH] nilfs2: prevent general protection fault in
nilfs_clear_dirty_page()
In a syzbot stress test that deliberately causes file system errors on
nilfs2 with a corrupted disk image, it has been reported that
nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
cache and calling nilfs_clear_dirty_page() for each dirty page/folio
retrieved, the back reference from the argument page to "mapping" may have
been changed to NULL (and possibly others). It is necessary to check this
after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
after locking it in nilfs_clear_dirty_pages() if the back reference
"mapping" from the page/folio is different from the "mapping" that held
the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+53369d11851d8f26735c(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5cf30827f244..b4e54d079b7d 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
struct folio *folio = fbatch.folios[i];
folio_lock(folio);
- nilfs_clear_dirty_page(&folio->page, silent);
+
+ /*
+ * This folio may have been removed from the address
+ * space by truncation or invalidation when the lock
+ * was acquired. Skip processing in that case.
+ */
+ if (likely(folio->mapping == mapping))
+ nilfs_clear_dirty_page(&folio->page, silent);
+
folio_unlock(folio);
}
folio_batch_release(&fbatch);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 782e53d0c14420858dbf0f8f797973c150d3b6d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062319-designer-nephew-119f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782e53d0c14420858dbf0f8f797973c150d3b6d7 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Mon, 12 Jun 2023 11:14:56 +0900
Subject: [PATCH] nilfs2: prevent general protection fault in
nilfs_clear_dirty_page()
In a syzbot stress test that deliberately causes file system errors on
nilfs2 with a corrupted disk image, it has been reported that
nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
cache and calling nilfs_clear_dirty_page() for each dirty page/folio
retrieved, the back reference from the argument page to "mapping" may have
been changed to NULL (and possibly others). It is necessary to check this
after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
after locking it in nilfs_clear_dirty_pages() if the back reference
"mapping" from the page/folio is different from the "mapping" that held
the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+53369d11851d8f26735c(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5cf30827f244..b4e54d079b7d 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
struct folio *folio = fbatch.folios[i];
folio_lock(folio);
- nilfs_clear_dirty_page(&folio->page, silent);
+
+ /*
+ * This folio may have been removed from the address
+ * space by truncation or invalidation when the lock
+ * was acquired. Skip processing in that case.
+ */
+ if (likely(folio->mapping == mapping))
+ nilfs_clear_dirty_page(&folio->page, silent);
+
folio_unlock(folio);
}
folio_batch_release(&fbatch);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 782e53d0c14420858dbf0f8f797973c150d3b6d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062317-gradient-unopened-2f06@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782e53d0c14420858dbf0f8f797973c150d3b6d7 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Mon, 12 Jun 2023 11:14:56 +0900
Subject: [PATCH] nilfs2: prevent general protection fault in
nilfs_clear_dirty_page()
In a syzbot stress test that deliberately causes file system errors on
nilfs2 with a corrupted disk image, it has been reported that
nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
cache and calling nilfs_clear_dirty_page() for each dirty page/folio
retrieved, the back reference from the argument page to "mapping" may have
been changed to NULL (and possibly others). It is necessary to check this
after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
after locking it in nilfs_clear_dirty_pages() if the back reference
"mapping" from the page/folio is different from the "mapping" that held
the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+53369d11851d8f26735c(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5cf30827f244..b4e54d079b7d 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
struct folio *folio = fbatch.folios[i];
folio_lock(folio);
- nilfs_clear_dirty_page(&folio->page, silent);
+
+ /*
+ * This folio may have been removed from the address
+ * space by truncation or invalidation when the lock
+ * was acquired. Skip processing in that case.
+ */
+ if (likely(folio->mapping == mapping))
+ nilfs_clear_dirty_page(&folio->page, silent);
+
folio_unlock(folio);
}
folio_batch_release(&fbatch);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 935d44acf621aa0688fef8312dec3e5940f38f4e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062314-saffron-portly-6c2a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 935d44acf621aa0688fef8312dec3e5940f38f4e Mon Sep 17 00:00:00 2001
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Date: Wed, 7 Jun 2023 15:24:27 +0200
Subject: [PATCH] memfd: check for non-NULL file_seals in memfd_create()
syscall
Ensure that file_seals is non-NULL before using it in the memfd_create()
syscall. One situation in which memfd_file_seals_ptr() could return a
NULL pointer when CONFIG_SHMEM=n, oopsing the kernel.
Link: https://lkml.kernel.org/r/20230607132427.2867435-1-roberto.sassu@huaweiclou…
Fixes: 47b9012ecdc7 ("shmem: add sealing support to hugetlb-backed memfd")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: Marc-Andr Lureau <marcandre.lureau(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memfd.c b/mm/memfd.c
index 69b90c31d38c..e763e76f1106 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -371,12 +371,15 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111;
file_seals = memfd_file_seals_ptr(file);
- *file_seals &= ~F_SEAL_SEAL;
- *file_seals |= F_SEAL_EXEC;
+ if (file_seals) {
+ *file_seals &= ~F_SEAL_SEAL;
+ *file_seals |= F_SEAL_EXEC;
+ }
} else if (flags & MFD_ALLOW_SEALING) {
/* MFD_EXEC and MFD_ALLOW_SEALING are set */
file_seals = memfd_file_seals_ptr(file);
- *file_seals &= ~F_SEAL_SEAL;
+ if (file_seals)
+ *file_seals &= ~F_SEAL_SEAL;
}
fd_install(fd, file);
When __blkcg_rstat_flush() is called from cgroup_rstat_flush*() code
path, interrupt is always disabled.
When we start to flush blkcg per-cpu stats list in __blkg_release()
for avoiding to leak blkcg_gq's reference in commit 20cb1c2fb756
("blk-cgroup: Flush stats before releasing blkcg_gq"), local irq
isn't disabled yet, then lockdep warning may be triggered because
the dependent cgroup locks may be acquired from irq(soft irq) handler.
Fix the issue by disabling local irq always.
Fixes: 20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Closes: https://lore.kernel.org/linux-block/pz2wzwnmn5tk3pwpskmjhli6g3qly7eoknilb26…
Cc: stable(a)vger.kernel.org
Cc: Jay Shin <jaeshin(a)redhat.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Waiman Long <longman(a)redhat.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-cgroup.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index f0b5c9c41cde..dce1548a7a0c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -970,6 +970,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
struct llist_node *lnode;
struct blkg_iostat_set *bisc, *next_bisc;
+ unsigned long flags;
rcu_read_lock();
@@ -983,7 +984,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
* When flushing from cgroup, cgroup_rstat_lock is always held, so
* this lock won't cause contention most of time.
*/
- raw_spin_lock(&blkg_stat_lock);
+ raw_spin_lock_irqsave(&blkg_stat_lock, flags);
/*
* Iterate only the iostat_cpu's queued in the lockless list.
@@ -1009,7 +1010,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
blkcg_iostat_update(parent, &blkg->iostat.cur,
&blkg->iostat.last);
}
- raw_spin_unlock(&blkg_stat_lock);
+ raw_spin_unlock_irqrestore(&blkg_stat_lock, flags);
out:
rcu_read_unlock();
}
--
2.40.1
Fix the timeout that is used for the initialisation and for the self
test. wait_for_completion_timeout expects a timeout in jiffies, but
RNGC_TIMEOUT is in milliseconds. Call msecs_to_jiffies to do the
conversion.
Cc: stable(a)vger.kernel.org
Fixes: 1d5449445bd0 ("hwrng: mx-rngc - add a driver for Freescale RNGC")
Signed-off-by: Martin Kaiser <martin(a)kaiser.cx>
---
drivers/char/hw_random/imx-rngc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/char/hw_random/imx-rngc.c b/drivers/char/hw_random/imx-rngc.c
index 1a6a5dd0a5a1..e5a9dee615c8 100644
--- a/drivers/char/hw_random/imx-rngc.c
+++ b/drivers/char/hw_random/imx-rngc.c
@@ -110,7 +110,7 @@ static int imx_rngc_self_test(struct imx_rngc *rngc)
cmd = readl(rngc->base + RNGC_COMMAND);
writel(cmd | RNGC_CMD_SELF_TEST, rngc->base + RNGC_COMMAND);
- ret = wait_for_completion_timeout(&rngc->rng_op_done, RNGC_TIMEOUT);
+ ret = wait_for_completion_timeout(&rngc->rng_op_done, msecs_to_jiffies(RNGC_TIMEOUT));
imx_rngc_irq_mask_clear(rngc);
if (!ret)
return -ETIMEDOUT;
@@ -182,9 +182,7 @@ static int imx_rngc_init(struct hwrng *rng)
cmd = readl(rngc->base + RNGC_COMMAND);
writel(cmd | RNGC_CMD_SEED, rngc->base + RNGC_COMMAND);
- ret = wait_for_completion_timeout(&rngc->rng_op_done,
- RNGC_TIMEOUT);
-
+ ret = wait_for_completion_timeout(&rngc->rng_op_done, msecs_to_jiffies(RNGC_TIMEOUT));
if (!ret) {
ret = -ETIMEDOUT;
goto err;
--
2.30.2
From: Sheetal <sheetal(a)nvidia.com>
I2S data sanity tests fail beyond a bit clock frequency of 6.144MHz.
This happens because the AHUB clock rate is too low and it shows
9.83MHz on boot.
The maximum rate of PLLA_OUT0 is 49.152MHz and is used to serve I/O
clocks. It is recommended that AHUB clock operates higher than this.
Thus fix this by using PLLP_OUT0 as parent clock for AHUB instead of
PLLA_OUT0 and fix the rate to 81.6MHz.
Fixes: dc94a94daa39 ("arm64: tegra: Add audio devices on Tegra234")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sheetal <sheetal(a)nvidia.com>
Signed-off-by: Sameer Pujar <spujar(a)nvidia.com>
Reviewed-by: Mohan Kumar D <mkumard(a)nvidia.com>
---
arch/arm64/boot/dts/nvidia/tegra234.dtsi | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index f4974e8..0f12a8de 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -180,7 +180,8 @@
clocks = <&bpmp TEGRA234_CLK_AHUB>;
clock-names = "ahub";
assigned-clocks = <&bpmp TEGRA234_CLK_AHUB>;
- assigned-clock-parents = <&bpmp TEGRA234_CLK_PLLA_OUT0>;
+ assigned-clock-parents = <&bpmp TEGRA234_CLK_PLLP_OUT0>;
+ assigned-clock-rates = <81600000>;
status = "disabled";
#address-cells = <2>;
--
2.7.4
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
With commit d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return
error on FC timeout on TX path") the missing correct return value in
the case of a protocol error was introduced.
But the way the error value has been read and sent to the user space
does not follow the common scheme to clear the error after reading
which is provided by the sock_error() function. This leads to an error
report at the following write() attempt although everything should be
working.
Fixes: d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path")
Reported-by: Carsten Schmidt <carsten.schmidt-achim(a)t-online.de>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Link: https://lore.kernel.org/all/20230607072708.38809-1-socketcan@hartkopp.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/isotp.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 84f9aba02901..ca9d728d6d72 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1112,8 +1112,9 @@ static int isotp_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
if (err)
goto err_event_drop;
- if (sk->sk_err)
- return -sk->sk_err;
+ err = sock_error(sk);
+ if (err)
+ return err;
}
return size;
base-commit: 7f4e09700bdc13ce9aafa279bc999051e9bcda35
--
2.40.1
Fix an issue where the kernel would stall during netboot, showing the
"sched: RT throttling activated" message. This stall was triggered by
the behavior of the mii_interrupt bit (Bit 7 - DP83TD510E_STS_MII_INT)
in the DP83TD510E's PHY_STS Register (Address = 0x10). The DP83TD510E
datasheet (2020) states that the bit clears on write, however, in
practice, the bit clears on read.
This discrepancy had significant implications on the driver's interrupt
handling. The PHY_STS Register was used by handle_interrupt() to check
for pending interrupts and by read_status() to get the current link
status. The call to read_status() was unintentionally clearing the
mii_interrupt status bit without deasserting the IRQ pin, causing
handle_interrupt() to miss other pending interrupts. This issue was most
apparent during netboot.
The fix refrains from using the PHY_STS Register for interrupt handling.
Instead, we now solely rely on the INTERRUPT_REG_1 Register (Address =
0x12) and INTERRUPT_REG_2 Register (Address = 0x13) for this purpose.
These registers directly influence the IRQ pin state and are latched
high until read.
Note: The INTERRUPT_REG_2 Register (Address = 0x13) exists and can also
be used for interrupt handling, specifically for "Aneg page received
interrupt" and "Polarity change interrupt". However, these features are
currently not supported by this driver.
Fixes: 165cd04fe253 ("net: phy: dp83td510: Add support for the DP83TD510 Ethernet PHY")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
---
drivers/net/phy/dp83td510.c | 23 +++++------------------
1 file changed, 5 insertions(+), 18 deletions(-)
diff --git a/drivers/net/phy/dp83td510.c b/drivers/net/phy/dp83td510.c
index 3cd9a77f9532..d7616b13c594 100644
--- a/drivers/net/phy/dp83td510.c
+++ b/drivers/net/phy/dp83td510.c
@@ -12,6 +12,11 @@
/* MDIO_MMD_VEND2 registers */
#define DP83TD510E_PHY_STS 0x10
+/* Bit 7 - mii_interrupt, active high. Clears on read.
+ * Note: Clearing does not necessarily deactivate IRQ pin if interrupts pending.
+ * This differs from the DP83TD510E datasheet (2020) which states this bit
+ * clears on write 0.
+ */
#define DP83TD510E_STS_MII_INT BIT(7)
#define DP83TD510E_LINK_STATUS BIT(0)
@@ -53,12 +58,6 @@ static int dp83td510_config_intr(struct phy_device *phydev)
int ret;
if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
- /* Clear any pending interrupts */
- ret = phy_write_mmd(phydev, MDIO_MMD_VEND2, DP83TD510E_PHY_STS,
- 0x0);
- if (ret)
- return ret;
-
ret = phy_write_mmd(phydev, MDIO_MMD_VEND2,
DP83TD510E_INTERRUPT_REG_1,
DP83TD510E_INT1_LINK_EN);
@@ -81,10 +80,6 @@ static int dp83td510_config_intr(struct phy_device *phydev)
DP83TD510E_GENCFG_INT_EN);
if (ret)
return ret;
-
- /* Clear any pending interrupts */
- ret = phy_write_mmd(phydev, MDIO_MMD_VEND2, DP83TD510E_PHY_STS,
- 0x0);
}
return ret;
@@ -94,14 +89,6 @@ static irqreturn_t dp83td510_handle_interrupt(struct phy_device *phydev)
{
int ret;
- ret = phy_read_mmd(phydev, MDIO_MMD_VEND2, DP83TD510E_PHY_STS);
- if (ret < 0) {
- phy_error(phydev);
- return IRQ_NONE;
- } else if (!(ret & DP83TD510E_STS_MII_INT)) {
- return IRQ_NONE;
- }
-
/* Read the current enabled interrupts */
ret = phy_read_mmd(phydev, MDIO_MMD_VEND2, DP83TD510E_INTERRUPT_REG_1);
if (ret < 0) {
--
2.39.2
Especially on 32-bit systems, it is possible for the pointer arithmetic
to overflow and cause a userspace pointer to be dereferenced in the
kernel.
Signed-off-by: Demi Marie Obenour <demi(a)invisiblethingslab.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
---
drivers/md/dm-ioctl.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 34fa74c6a70db8aa67aaba3f6a2fc4f38ef736bc..64e8f16d344c47057de5e2d29e3d63202197dca0 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1396,6 +1396,25 @@ static int next_target(struct dm_target_spec *last, uint32_t next, void *end,
{
static_assert(_Alignof(struct dm_target_spec) <= 8,
"struct dm_target_spec has excessive alignment requirements");
+ static_assert(offsetof(struct dm_ioctl, data) >= sizeof(struct dm_target_spec),
+ "struct dm_target_spec too big");
+
+ /*
+ * Number of bytes remaining, starting with last. This is always
+ * sizeof(struct dm_target_spec) or more, as otherwise *last was
+ * out of bounds already.
+ */
+ size_t remaining = (char *)end - (char *)last;
+
+ /*
+ * There must be room for both the next target spec and the
+ * NUL-terminator of the target itself.
+ */
+ if (remaining - sizeof(struct dm_target_spec) <= next) {
+ DMERR("Target spec extends beyond end of parameters");
+ return -EINVAL;
+ }
+
if (next % 8) {
DMERR("Next target spec (offset %u) is not 8-byte aligned", next);
return -EINVAL;
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
From: Austin Zheng <austin.zheng(a)amd.com>
[Why]
K1 and K2 not being setting properly when subVP is active.
[How]
Have phantom pipes use the same programing as the main pipes without
checking the paired stream
Cc: stable(a)vger.kernel.org
Reviewed-by: Alvin Lee <alvin.lee2(a)amd.com>
Acked-by: Rodrigo Siqueira <rodrigo.siqueira(a)amd.com>
Signed-off-by: Austin Zheng <austin.zheng(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 526e89ef9cdc..d52d5feeb311 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -1147,10 +1147,6 @@ unsigned int dcn32_calculate_dccg_k1_k2_values(struct pipe_ctx *pipe_ctx, unsign
unsigned int odm_combine_factor = 0;
bool two_pix_per_container = false;
- // For phantom pipes, use the same programming as the main pipes
- if (pipe_ctx->stream->mall_stream_config.type == SUBVP_PHANTOM) {
- stream = pipe_ctx->stream->mall_stream_config.paired_stream;
- }
two_pix_per_container = optc2_is_two_pixels_per_containter(&stream->timing);
odm_combine_factor = get_odm_config(pipe_ctx, NULL);
--
2.39.2
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x dc97251bf0b70549c76ba261516c01b8096771c5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062242-ripple-resilient-26a8@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dc97251bf0b70549c76ba261516c01b8096771c5 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Thu, 8 Jun 2023 18:38:47 +0200
Subject: [PATCH] selftests: mptcp: diag: skip listen tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the listen diag dump support introduced by
commit 4fa39b701ce9 ("mptcp: listen diag dump support").
It looks like there is no good pre-check to do here, i.e. dedicated
function available in kallsyms. Instead, we try to get info if nothing
is returned, the test is marked as skipped.
That's not ideal because something could be wrong with the feature and
instead of reporting an error, the test could be marked as skipped. If
we know in advanced that the feature is supposed to be supported, the
tester can set SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var to 1: in
this case the test will report an error instead of marking the test as
skipped if nothing is returned.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: f2ae0fa68e28 ("selftests/mptcp: add diag listen tests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index 4eacdb1ab962..4a6165389b74 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -55,16 +55,20 @@ __chk_nr()
{
local command="$1"
local expected=$2
- local msg nr
+ local msg="$3"
+ local skip="${4:-SKIP}"
+ local nr
- shift 2
- msg=$*
nr=$(eval $command)
printf "%-50s" "$msg"
if [ $nr != $expected ]; then
- echo "[ fail ] expected $expected found $nr"
- ret=$test_cnt
+ if [ $nr = "$skip" ] && ! mptcp_lib_expect_all_features; then
+ echo "[ skip ] Feature probably not supported"
+ else
+ echo "[ fail ] expected $expected found $nr"
+ ret=$test_cnt
+ fi
else
echo "[ ok ]"
fi
@@ -76,12 +80,12 @@ __chk_msk_nr()
local condition=$1
shift 1
- __chk_nr "ss -inmHMN $ns | $condition" $*
+ __chk_nr "ss -inmHMN $ns | $condition" "$@"
}
chk_msk_nr()
{
- __chk_msk_nr "grep -c token:" $*
+ __chk_msk_nr "grep -c token:" "$@"
}
wait_msk_nr()
@@ -119,37 +123,26 @@ wait_msk_nr()
chk_msk_fallback_nr()
{
- __chk_msk_nr "grep -c fallback" $*
+ __chk_msk_nr "grep -c fallback" "$@"
}
chk_msk_remote_key_nr()
{
- __chk_msk_nr "grep -c remote_key" $*
+ __chk_msk_nr "grep -c remote_key" "$@"
}
__chk_listen()
{
local filter="$1"
local expected=$2
+ local msg="$3"
- shift 2
- msg=$*
-
- nr=$(ss -N $ns -Ml "$filter" | grep -c LISTEN)
- printf "%-50s" "$msg"
-
- if [ $nr != $expected ]; then
- echo "[ fail ] expected $expected found $nr"
- ret=$test_cnt
- else
- echo "[ ok ]"
- fi
+ __chk_nr "ss -N $ns -Ml '$filter' | grep -c LISTEN" "$expected" "$msg" 0
}
chk_msk_listen()
{
lport=$1
- local msg="check for listen socket"
# destination port search should always return empty list
__chk_listen "dport $lport" 0 "listen match for dport $lport"
@@ -167,10 +160,9 @@ chk_msk_listen()
chk_msk_inuse()
{
local expected=$1
+ local msg="$2"
local listen_nr
- shift 1
-
listen_nr=$(ss -N "${ns}" -Ml | grep -c LISTEN)
expected=$((expected + listen_nr))
@@ -181,7 +173,7 @@ chk_msk_inuse()
sleep 0.1
done
- __chk_nr get_msk_inuse $expected $*
+ __chk_nr get_msk_inuse $expected "$msg"
}
# $1: ns, $2: port
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2177d0b08e421971e035672b70f3228d9485c650
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062218-porous-squiggle-d837@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2177d0b08e421971e035672b70f3228d9485c650 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Thu, 8 Jun 2023 18:38:49 +0200
Subject: [PATCH] selftests: mptcp: pm nl: remove hardcoded default limits
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the checks of the default limits returned by the MPTCP
in-kernel path-manager. The default values have been modified by commit
72bcbc46a5c3 ("mptcp: increase default max additional subflows to 2").
Instead of comparing with hardcoded values, we can get the default one
and compare with them.
Note that if we expect to have the latest version, we continue to check
the hardcoded values to avoid unexpected behaviour changes.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: eedbc685321b ("selftests: add PM netlink functional tests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index 32f7533e0919..664cafc60705 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -73,8 +73,12 @@ check()
}
check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "defaults addr list"
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
+
+default_limits="$(ip netns exec $ns1 ./pm_nl_ctl limits)"
+if mptcp_lib_expect_all_features; then
+ check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
subflows 2" "defaults limits"
+fi
ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.1
ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.2 flags subflow dev lo
@@ -121,12 +125,10 @@ ip netns exec $ns1 ./pm_nl_ctl flush
check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "flush addrs"
ip netns exec $ns1 ./pm_nl_ctl limits 9 1
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
-subflows 2" "rcv addrs above hard limit"
+check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "rcv addrs above hard limit"
ip netns exec $ns1 ./pm_nl_ctl limits 1 9
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
-subflows 2" "subflows above hard limit"
+check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "subflows above hard limit"
ip netns exec $ns1 ./pm_nl_ctl limits 8 8
check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 8
It looks like this test was broken in v6.1 after the backport of commit
48d73f609dcc ("selftests: mptcp: update userspace pm addr tests").
It was not working because the commit ad3493746ebe ("selftests: mptcp:
add test-cases for mixed v4/v6 subflows") is not in v6.1. This commit
changes how the connections are being created in mptcp_join.sh selftest:
with IPv6 support always enabled. But then in v6.1, the server still
create IPv4 only connections, so without the v4-mapped-v6 format with
the "::ffff:" prefix like we have in v6.3.
The modification here adds a support for connections created in v4 as
well so it fixes the issue in v6.1. This patch is not needed for the
selftests in v6.3 because only IPv6 listening sockets are being created.
Fixes: 8f0ba8ec18f5 ("selftests: mptcp: update userspace pm addr tests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2a238ae842d8..7b65003ee8cf 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -946,11 +946,12 @@ do_transfer()
sp=$(grep "type:10" "$evts_ns1" |
sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q')
da=$(grep "type:10" "$evts_ns1" |
- sed -n 's/.*\(daddr6:\)\([0-9a-f:.]*\).*$/\2/p;q')
+ sed -n 's/.*\(daddr[46]:\)\([0-9a-f:.]*\).*$/\2/p;q')
+ echo "$da" | grep -q ":" && addr="::ffff:$addr"
dp=$(grep "type:10" "$evts_ns1" |
sed -n 's/.*\(dport:\)\([[:digit:]]*\).*$/\2/p;q')
ip netns exec ${listener_ns} ./pm_nl_ctl rem token $tk id $id
- ip netns exec ${listener_ns} ./pm_nl_ctl dsf lip "::ffff:$addr" \
+ ip netns exec ${listener_ns} ./pm_nl_ctl dsf lip "$addr" \
lport $sp rip $da rport $dp token $tk
fi
--
2.40.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2177d0b08e421971e035672b70f3228d9485c650
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062217-never-sedan-c4bd@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2177d0b08e421971e035672b70f3228d9485c650 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Thu, 8 Jun 2023 18:38:49 +0200
Subject: [PATCH] selftests: mptcp: pm nl: remove hardcoded default limits
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the checks of the default limits returned by the MPTCP
in-kernel path-manager. The default values have been modified by commit
72bcbc46a5c3 ("mptcp: increase default max additional subflows to 2").
Instead of comparing with hardcoded values, we can get the default one
and compare with them.
Note that if we expect to have the latest version, we continue to check
the hardcoded values to avoid unexpected behaviour changes.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: eedbc685321b ("selftests: add PM netlink functional tests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index 32f7533e0919..664cafc60705 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -73,8 +73,12 @@ check()
}
check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "defaults addr list"
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
+
+default_limits="$(ip netns exec $ns1 ./pm_nl_ctl limits)"
+if mptcp_lib_expect_all_features; then
+ check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
subflows 2" "defaults limits"
+fi
ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.1
ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.2 flags subflow dev lo
@@ -121,12 +125,10 @@ ip netns exec $ns1 ./pm_nl_ctl flush
check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "flush addrs"
ip netns exec $ns1 ./pm_nl_ctl limits 9 1
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
-subflows 2" "rcv addrs above hard limit"
+check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "rcv addrs above hard limit"
ip netns exec $ns1 ./pm_nl_ctl limits 1 9
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
-subflows 2" "subflows above hard limit"
+check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "subflows above hard limit"
ip netns exec $ns1 ./pm_nl_ctl limits 8 8
check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 8
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4a0b866a3f7d3c22033f40e93e94befc6fe51bce
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062220-unclamped-fiddle-b56e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4a0b866a3f7d3c22033f40e93e94befc6fe51bce Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Sat, 10 Jun 2023 18:11:40 +0200
Subject: [PATCH] selftests: mptcp: join: skip test if iptables/tc cmds fail
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Some tests are using IPTables and/or TC commands to force some
behaviours. If one of these commands fails -- likely because some
features are not available due to missing kernel config -- we should
intercept the error and skip the tests requiring these features.
Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.
This patch also replaces the 'exit 1' by 'return 1' not to stop the
selftest in the middle without the conclusion if there is an issue with
NF or TC.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 276396cbe60c..c471934ad5e0 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -286,11 +286,15 @@ reset_with_add_addr_timeout()
fi
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
- ip netns exec $ns2 $tables -A OUTPUT -p tcp \
- -m tcp --tcp-option 30 \
- -m bpf --bytecode \
- "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \
- -j DROP
+
+ if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
+ -m tcp --tcp-option 30 \
+ -m bpf --bytecode \
+ "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \
+ -j DROP; then
+ mark_as_skipped "unable to set the 'add addr' rule"
+ return 1
+ fi
}
# $1: test name
@@ -334,17 +338,12 @@ reset_with_allow_join_id0()
# tc action pedit offset 162 out of bounds
#
# Netfilter is used to mark packets with enough data.
-reset_with_fail()
+setup_fail_rules()
{
- reset "${1}" || return 1
-
- ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
- ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1
-
check_invert=1
validate_checksum=1
- local i="$2"
- local ip="${3:-4}"
+ local i="$1"
+ local ip="${2:-4}"
local tables
tables="${iptables}"
@@ -359,15 +358,32 @@ reset_with_fail()
-p tcp \
-m length --length 150:9999 \
-m statistic --mode nth --packet 1 --every 99999 \
- -j MARK --set-mark 42 || exit 1
+ -j MARK --set-mark 42 || return ${ksft_skip}
- tc -n $ns2 qdisc add dev ns2eth$i clsact || exit 1
+ tc -n $ns2 qdisc add dev ns2eth$i clsact || return ${ksft_skip}
tc -n $ns2 filter add dev ns2eth$i egress \
protocol ip prio 1000 \
handle 42 fw \
action pedit munge offset 148 u8 invert \
pipe csum tcp \
- index 100 || exit 1
+ index 100 || return ${ksft_skip}
+}
+
+reset_with_fail()
+{
+ reset "${1}" || return 1
+ shift
+
+ ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
+ ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1
+
+ local rc=0
+ setup_fail_rules "${@}" || rc=$?
+
+ if [ ${rc} -eq ${ksft_skip} ]; then
+ mark_as_skipped "unable to set the 'fail' rules"
+ return 1
+ fi
}
reset_with_events()
@@ -382,6 +398,25 @@ reset_with_events()
evts_ns2_pid=$!
}
+reset_with_tcp_filter()
+{
+ reset "${1}" || return 1
+ shift
+
+ local ns="${!1}"
+ local src="${2}"
+ local target="${3}"
+
+ if ! ip netns exec "${ns}" ${iptables} \
+ -A INPUT \
+ -s "${src}" \
+ -p tcp \
+ -j "${target}"; then
+ mark_as_skipped "unable to set the filter rules"
+ return 1
+ fi
+}
+
fail_test()
{
ret=1
@@ -745,15 +780,6 @@ pm_nl_check_endpoint()
fi
}
-filter_tcp_from()
-{
- local ns="${1}"
- local src="${2}"
- local target="${3}"
-
- ip netns exec "${ns}" ${iptables} -A INPUT -s "${src}" -p tcp -j "${target}"
-}
-
do_transfer()
{
local listener_ns="$1"
@@ -1975,23 +2001,23 @@ subflows_error_tests()
fi
# multiple subflows, with subflow creation error
- if reset "multi subflows, with failing subflow"; then
+ if reset_with_tcp_filter "multi subflows, with failing subflow" ns1 10.0.3.2 REJECT &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 0 2
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 REJECT
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
chk_join_nr 1 1 1
fi
# multiple subflows, with subflow timeout on MPJ
- if reset "multi subflows, with subflow timeout"; then
+ if reset_with_tcp_filter "multi subflows, with subflow timeout" ns1 10.0.3.2 DROP &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 0 2
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 DROP
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
chk_join_nr 1 1 1
fi
@@ -1999,11 +2025,11 @@ subflows_error_tests()
# multiple subflows, check that the endpoint corresponding to
# closed subflow (due to reset) is not reused if additional
# subflows are added later
- if reset "multi subflows, fair usage on close"; then
+ if reset_with_tcp_filter "multi subflows, fair usage on close" ns1 10.0.3.2 REJECT &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 1
pm_nl_set_limits $ns2 0 1
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 REJECT
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
# mpj subflow will be in TW after the reset
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4a0b866a3f7d3c22033f40e93e94befc6fe51bce
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062219-thank-remedial-82cc@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4a0b866a3f7d3c22033f40e93e94befc6fe51bce Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Sat, 10 Jun 2023 18:11:40 +0200
Subject: [PATCH] selftests: mptcp: join: skip test if iptables/tc cmds fail
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Some tests are using IPTables and/or TC commands to force some
behaviours. If one of these commands fails -- likely because some
features are not available due to missing kernel config -- we should
intercept the error and skip the tests requiring these features.
Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.
This patch also replaces the 'exit 1' by 'return 1' not to stop the
selftest in the middle without the conclusion if there is an issue with
NF or TC.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 276396cbe60c..c471934ad5e0 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -286,11 +286,15 @@ reset_with_add_addr_timeout()
fi
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
- ip netns exec $ns2 $tables -A OUTPUT -p tcp \
- -m tcp --tcp-option 30 \
- -m bpf --bytecode \
- "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \
- -j DROP
+
+ if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
+ -m tcp --tcp-option 30 \
+ -m bpf --bytecode \
+ "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \
+ -j DROP; then
+ mark_as_skipped "unable to set the 'add addr' rule"
+ return 1
+ fi
}
# $1: test name
@@ -334,17 +338,12 @@ reset_with_allow_join_id0()
# tc action pedit offset 162 out of bounds
#
# Netfilter is used to mark packets with enough data.
-reset_with_fail()
+setup_fail_rules()
{
- reset "${1}" || return 1
-
- ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
- ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1
-
check_invert=1
validate_checksum=1
- local i="$2"
- local ip="${3:-4}"
+ local i="$1"
+ local ip="${2:-4}"
local tables
tables="${iptables}"
@@ -359,15 +358,32 @@ reset_with_fail()
-p tcp \
-m length --length 150:9999 \
-m statistic --mode nth --packet 1 --every 99999 \
- -j MARK --set-mark 42 || exit 1
+ -j MARK --set-mark 42 || return ${ksft_skip}
- tc -n $ns2 qdisc add dev ns2eth$i clsact || exit 1
+ tc -n $ns2 qdisc add dev ns2eth$i clsact || return ${ksft_skip}
tc -n $ns2 filter add dev ns2eth$i egress \
protocol ip prio 1000 \
handle 42 fw \
action pedit munge offset 148 u8 invert \
pipe csum tcp \
- index 100 || exit 1
+ index 100 || return ${ksft_skip}
+}
+
+reset_with_fail()
+{
+ reset "${1}" || return 1
+ shift
+
+ ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
+ ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1
+
+ local rc=0
+ setup_fail_rules "${@}" || rc=$?
+
+ if [ ${rc} -eq ${ksft_skip} ]; then
+ mark_as_skipped "unable to set the 'fail' rules"
+ return 1
+ fi
}
reset_with_events()
@@ -382,6 +398,25 @@ reset_with_events()
evts_ns2_pid=$!
}
+reset_with_tcp_filter()
+{
+ reset "${1}" || return 1
+ shift
+
+ local ns="${!1}"
+ local src="${2}"
+ local target="${3}"
+
+ if ! ip netns exec "${ns}" ${iptables} \
+ -A INPUT \
+ -s "${src}" \
+ -p tcp \
+ -j "${target}"; then
+ mark_as_skipped "unable to set the filter rules"
+ return 1
+ fi
+}
+
fail_test()
{
ret=1
@@ -745,15 +780,6 @@ pm_nl_check_endpoint()
fi
}
-filter_tcp_from()
-{
- local ns="${1}"
- local src="${2}"
- local target="${3}"
-
- ip netns exec "${ns}" ${iptables} -A INPUT -s "${src}" -p tcp -j "${target}"
-}
-
do_transfer()
{
local listener_ns="$1"
@@ -1975,23 +2001,23 @@ subflows_error_tests()
fi
# multiple subflows, with subflow creation error
- if reset "multi subflows, with failing subflow"; then
+ if reset_with_tcp_filter "multi subflows, with failing subflow" ns1 10.0.3.2 REJECT &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 0 2
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 REJECT
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
chk_join_nr 1 1 1
fi
# multiple subflows, with subflow timeout on MPJ
- if reset "multi subflows, with subflow timeout"; then
+ if reset_with_tcp_filter "multi subflows, with subflow timeout" ns1 10.0.3.2 DROP &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 0 2
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 DROP
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
chk_join_nr 1 1 1
fi
@@ -1999,11 +2025,11 @@ subflows_error_tests()
# multiple subflows, check that the endpoint corresponding to
# closed subflow (due to reset) is not reused if additional
# subflows are added later
- if reset "multi subflows, fair usage on close"; then
+ if reset_with_tcp_filter "multi subflows, fair usage on close" ns1 10.0.3.2 REJECT &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 1
pm_nl_set_limits $ns2 0 1
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 REJECT
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
# mpj subflow will be in TW after the reset
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x cdb50525345cf5a8359ee391032ef606a7826f08
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062228-transport-graded-b527@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cdb50525345cf5a8359ee391032ef606a7826f08 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Sat, 10 Jun 2023 18:11:38 +0200
Subject: [PATCH] selftests: mptcp: join: helpers to skip tests
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Here are some helpers that will be used to mark subtests as skipped if a
feature is not supported. Marking as a fix for the commit introducing
this selftest to help with the backports.
While at it, also check if kallsyms feature is available as it will also
be used in the following commits to check if MPTCP features are
available before starting a test.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 74cc8a74a9d6..a63aed145393 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -142,6 +142,7 @@ cleanup_partial()
check_tools()
{
mptcp_lib_check_mptcp
+ mptcp_lib_check_kallsyms
if ! ip -Version &> /dev/null; then
echo "SKIP: Could not run test without ip tool"
@@ -191,6 +192,32 @@ cleanup()
cleanup_partial
}
+# $1: msg
+print_title()
+{
+ printf "%03u %-36s %s" "${TEST_COUNT}" "${TEST_NAME}" "${1}"
+}
+
+# [ $1: fail msg ]
+mark_as_skipped()
+{
+ local msg="${1:-"Feature not supported"}"
+
+ mptcp_lib_fail_if_expected_feature "${msg}"
+
+ print_title "[ skip ] ${msg}"
+ printf "\n"
+}
+
+# $@: condition
+continue_if()
+{
+ if ! "${@}"; then
+ mark_as_skipped
+ return 1
+ fi
+}
+
skip_test()
{
if [ "${#only_tests_ids[@]}" -eq 0 ] && [ "${#only_tests_names[@]}" -eq 0 ]; then
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x cdb50525345cf5a8359ee391032ef606a7826f08
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062227-sporting-calcium-8268@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cdb50525345cf5a8359ee391032ef606a7826f08 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Sat, 10 Jun 2023 18:11:38 +0200
Subject: [PATCH] selftests: mptcp: join: helpers to skip tests
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Here are some helpers that will be used to mark subtests as skipped if a
feature is not supported. Marking as a fix for the commit introducing
this selftest to help with the backports.
While at it, also check if kallsyms feature is available as it will also
be used in the following commits to check if MPTCP features are
available before starting a test.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 74cc8a74a9d6..a63aed145393 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -142,6 +142,7 @@ cleanup_partial()
check_tools()
{
mptcp_lib_check_mptcp
+ mptcp_lib_check_kallsyms
if ! ip -Version &> /dev/null; then
echo "SKIP: Could not run test without ip tool"
@@ -191,6 +192,32 @@ cleanup()
cleanup_partial
}
+# $1: msg
+print_title()
+{
+ printf "%03u %-36s %s" "${TEST_COUNT}" "${TEST_NAME}" "${1}"
+}
+
+# [ $1: fail msg ]
+mark_as_skipped()
+{
+ local msg="${1:-"Feature not supported"}"
+
+ mptcp_lib_fail_if_expected_feature "${msg}"
+
+ print_title "[ skip ] ${msg}"
+ printf "\n"
+}
+
+# $@: condition
+continue_if()
+{
+ if ! "${@}"; then
+ mark_as_skipped
+ return 1
+ fi
+}
+
skip_test()
{
if [ "${#only_tests_ids[@]}" -eq 0 ] && [ "${#only_tests_names[@]}" -eq 0 ]; then
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0c4cd3f86a40028845ad6f8af5b37165666404cd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062203-parched-rippling-0d7c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0c4cd3f86a40028845ad6f8af5b37165666404cd Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Sat, 10 Jun 2023 18:11:37 +0200
Subject: [PATCH] selftests: mptcp: join: use 'iptables-legacy' if available
IPTables commands using 'iptables-nft' fail on old kernels, at least
5.15 because it doesn't see the default IPTables chains:
$ iptables -L
iptables/1.8.2 Failed to initialize nft: Protocol not supported
As a first step before switching to NFTables, we can use iptables-legacy
if available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 29f0c99d9a46..74cc8a74a9d6 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -25,6 +25,8 @@ capout=""
ns1=""
ns2=""
ksft_skip=4
+iptables="iptables"
+ip6tables="ip6tables"
timeout_poll=30
timeout_test=$((timeout_poll * 2 + 1))
capture=0
@@ -146,7 +148,11 @@ check_tools()
exit $ksft_skip
fi
- if ! iptables -V &> /dev/null; then
+ # Use the legacy version if available to support old kernel versions
+ if iptables-legacy -V &> /dev/null; then
+ iptables="iptables-legacy"
+ ip6tables="ip6tables-legacy"
+ elif ! iptables -V &> /dev/null; then
echo "SKIP: Could not run all tests without iptables tool"
exit $ksft_skip
fi
@@ -247,9 +253,9 @@ reset_with_add_addr_timeout()
reset "${1}" || return 1
- tables="iptables"
+ tables="${iptables}"
if [ $ip -eq 6 ]; then
- tables="ip6tables"
+ tables="${ip6tables}"
fi
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
@@ -314,9 +320,9 @@ reset_with_fail()
local ip="${3:-4}"
local tables
- tables="iptables"
+ tables="${iptables}"
if [ $ip -eq 6 ]; then
- tables="ip6tables"
+ tables="${ip6tables}"
fi
ip netns exec $ns2 $tables \
@@ -704,7 +710,7 @@ filter_tcp_from()
local src="${2}"
local target="${3}"
- ip netns exec "${ns}" iptables -A INPUT -s "${src}" -p tcp -j "${target}"
+ ip netns exec "${ns}" ${iptables} -A INPUT -s "${src}" -p tcp -j "${target}"
}
do_transfer()
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2177d0b08e421971e035672b70f3228d9485c650
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062216-ungodly-humorous-f47d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2177d0b08e421971e035672b70f3228d9485c650 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Thu, 8 Jun 2023 18:38:49 +0200
Subject: [PATCH] selftests: mptcp: pm nl: remove hardcoded default limits
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the checks of the default limits returned by the MPTCP
in-kernel path-manager. The default values have been modified by commit
72bcbc46a5c3 ("mptcp: increase default max additional subflows to 2").
Instead of comparing with hardcoded values, we can get the default one
and compare with them.
Note that if we expect to have the latest version, we continue to check
the hardcoded values to avoid unexpected behaviour changes.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: eedbc685321b ("selftests: add PM netlink functional tests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index 32f7533e0919..664cafc60705 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -73,8 +73,12 @@ check()
}
check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "defaults addr list"
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
+
+default_limits="$(ip netns exec $ns1 ./pm_nl_ctl limits)"
+if mptcp_lib_expect_all_features; then
+ check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
subflows 2" "defaults limits"
+fi
ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.1
ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.2 flags subflow dev lo
@@ -121,12 +125,10 @@ ip netns exec $ns1 ./pm_nl_ctl flush
check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "flush addrs"
ip netns exec $ns1 ./pm_nl_ctl limits 9 1
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
-subflows 2" "rcv addrs above hard limit"
+check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "rcv addrs above hard limit"
ip netns exec $ns1 ./pm_nl_ctl limits 1 9
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
-subflows 2" "subflows above hard limit"
+check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "subflows above hard limit"
ip netns exec $ns1 ./pm_nl_ctl limits 8 8
check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 8
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ff8897b5189495b47895ca247b860a29dc04b36b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062254-neglector-upbeat-10ca@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ff8897b5189495b47895ca247b860a29dc04b36b Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Sat, 10 Jun 2023 18:11:48 +0200
Subject: [PATCH] selftests: mptcp: join: skip fail tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the MP_FAIL / infinite mapping introduced
by commit 1e39e5a32ad7 ("mptcp: infinite mapping sending") and the
following ones.
It is possible to look for one of the infinite mapping counters to know
in advance if the this feature is available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b6e074e171bc ("selftests: mptcp: add infinite map testcase")
Cc: stable(a)vger.kernel.org
Fixes: 2ba18161d407 ("selftests: mptcp: add MP_FAIL reset testcase")
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index f9161ed69b86..7867bad59253 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -384,7 +384,7 @@ setup_fail_rules()
reset_with_fail()
{
- reset "${1}" || return 1
+ reset_check_counter "${1}" "MPTcpExtInfiniteMapTx" || return 1
shift
ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f2b492b04a167261e1c38eb76f78fb4294473a49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062235-census-ramp-a602@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f2b492b04a167261e1c38eb76f78fb4294473a49 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Sat, 10 Jun 2023 18:11:47 +0200
Subject: [PATCH] selftests: mptcp: join: skip userspace PM tests if not
supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the userspace PM introduced by commit
4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs")
and the following ones.
It is possible to look for the MPTCP pm_type's sysctl knob to know in
advance if the userspace PM is available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index f8e58ebcdd54..f9161ed69b86 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -84,7 +84,7 @@ init_partial()
ip netns add $netns || exit $ksft_skip
ip -net $netns link set lo up
ip netns exec $netns sysctl -q net.mptcp.enabled=1
- ip netns exec $netns sysctl -q net.mptcp.pm_type=0
+ ip netns exec $netns sysctl -q net.mptcp.pm_type=0 2>/dev/null || true
ip netns exec $netns sysctl -q net.ipv4.conf.all.rp_filter=0
ip netns exec $netns sysctl -q net.ipv4.conf.default.rp_filter=0
if [ $checksum -eq 1 ]; then
@@ -3191,7 +3191,8 @@ fail_tests()
userspace_tests()
{
# userspace pm type prevents add_addr
- if reset "userspace pm type prevents add_addr"; then
+ if reset "userspace pm type prevents add_addr" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns1
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 0 2
@@ -3202,7 +3203,8 @@ userspace_tests()
fi
# userspace pm type does not echo add_addr without daemon
- if reset "userspace pm no echo w/o daemon"; then
+ if reset "userspace pm no echo w/o daemon" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns2
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 0 2
@@ -3213,7 +3215,8 @@ userspace_tests()
fi
# userspace pm type rejects join
- if reset "userspace pm type rejects join"; then
+ if reset "userspace pm type rejects join" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns1
pm_nl_set_limits $ns1 1 1
pm_nl_set_limits $ns2 1 1
@@ -3223,7 +3226,8 @@ userspace_tests()
fi
# userspace pm type does not send join
- if reset "userspace pm type does not send join"; then
+ if reset "userspace pm type does not send join" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns2
pm_nl_set_limits $ns1 1 1
pm_nl_set_limits $ns2 1 1
@@ -3233,7 +3237,8 @@ userspace_tests()
fi
# userspace pm type prevents mp_prio
- if reset "userspace pm type prevents mp_prio"; then
+ if reset "userspace pm type prevents mp_prio" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns1
pm_nl_set_limits $ns1 1 1
pm_nl_set_limits $ns2 1 1
@@ -3244,7 +3249,8 @@ userspace_tests()
fi
# userspace pm type prevents rm_addr
- if reset "userspace pm type prevents rm_addr"; then
+ if reset "userspace pm type prevents rm_addr" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns1
set_userspace_pm $ns2
pm_nl_set_limits $ns1 0 1
@@ -3256,7 +3262,8 @@ userspace_tests()
fi
# userspace pm add & remove address
- if reset_with_events "userspace pm add & remove address"; then
+ if reset_with_events "userspace pm add & remove address" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns1
pm_nl_set_limits $ns2 1 1
run_tests $ns1 $ns2 10.0.1.1 0 userspace_1 0 slow
@@ -3267,7 +3274,8 @@ userspace_tests()
fi
# userspace pm create destroy subflow
- if reset_with_events "userspace pm create destroy subflow"; then
+ if reset_with_events "userspace pm create destroy subflow" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns2
pm_nl_set_limits $ns1 0 1
run_tests $ns1 $ns2 10.0.1.1 0 0 userspace_1 slow
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4a0b866a3f7d3c22033f40e93e94befc6fe51bce
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062218-dude-unshaven-b4cb@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4a0b866a3f7d3c22033f40e93e94befc6fe51bce Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Sat, 10 Jun 2023 18:11:40 +0200
Subject: [PATCH] selftests: mptcp: join: skip test if iptables/tc cmds fail
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Some tests are using IPTables and/or TC commands to force some
behaviours. If one of these commands fails -- likely because some
features are not available due to missing kernel config -- we should
intercept the error and skip the tests requiring these features.
Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.
This patch also replaces the 'exit 1' by 'return 1' not to stop the
selftest in the middle without the conclusion if there is an issue with
NF or TC.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 276396cbe60c..c471934ad5e0 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -286,11 +286,15 @@ reset_with_add_addr_timeout()
fi
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
- ip netns exec $ns2 $tables -A OUTPUT -p tcp \
- -m tcp --tcp-option 30 \
- -m bpf --bytecode \
- "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \
- -j DROP
+
+ if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
+ -m tcp --tcp-option 30 \
+ -m bpf --bytecode \
+ "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \
+ -j DROP; then
+ mark_as_skipped "unable to set the 'add addr' rule"
+ return 1
+ fi
}
# $1: test name
@@ -334,17 +338,12 @@ reset_with_allow_join_id0()
# tc action pedit offset 162 out of bounds
#
# Netfilter is used to mark packets with enough data.
-reset_with_fail()
+setup_fail_rules()
{
- reset "${1}" || return 1
-
- ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
- ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1
-
check_invert=1
validate_checksum=1
- local i="$2"
- local ip="${3:-4}"
+ local i="$1"
+ local ip="${2:-4}"
local tables
tables="${iptables}"
@@ -359,15 +358,32 @@ reset_with_fail()
-p tcp \
-m length --length 150:9999 \
-m statistic --mode nth --packet 1 --every 99999 \
- -j MARK --set-mark 42 || exit 1
+ -j MARK --set-mark 42 || return ${ksft_skip}
- tc -n $ns2 qdisc add dev ns2eth$i clsact || exit 1
+ tc -n $ns2 qdisc add dev ns2eth$i clsact || return ${ksft_skip}
tc -n $ns2 filter add dev ns2eth$i egress \
protocol ip prio 1000 \
handle 42 fw \
action pedit munge offset 148 u8 invert \
pipe csum tcp \
- index 100 || exit 1
+ index 100 || return ${ksft_skip}
+}
+
+reset_with_fail()
+{
+ reset "${1}" || return 1
+ shift
+
+ ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
+ ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1
+
+ local rc=0
+ setup_fail_rules "${@}" || rc=$?
+
+ if [ ${rc} -eq ${ksft_skip} ]; then
+ mark_as_skipped "unable to set the 'fail' rules"
+ return 1
+ fi
}
reset_with_events()
@@ -382,6 +398,25 @@ reset_with_events()
evts_ns2_pid=$!
}
+reset_with_tcp_filter()
+{
+ reset "${1}" || return 1
+ shift
+
+ local ns="${!1}"
+ local src="${2}"
+ local target="${3}"
+
+ if ! ip netns exec "${ns}" ${iptables} \
+ -A INPUT \
+ -s "${src}" \
+ -p tcp \
+ -j "${target}"; then
+ mark_as_skipped "unable to set the filter rules"
+ return 1
+ fi
+}
+
fail_test()
{
ret=1
@@ -745,15 +780,6 @@ pm_nl_check_endpoint()
fi
}
-filter_tcp_from()
-{
- local ns="${1}"
- local src="${2}"
- local target="${3}"
-
- ip netns exec "${ns}" ${iptables} -A INPUT -s "${src}" -p tcp -j "${target}"
-}
-
do_transfer()
{
local listener_ns="$1"
@@ -1975,23 +2001,23 @@ subflows_error_tests()
fi
# multiple subflows, with subflow creation error
- if reset "multi subflows, with failing subflow"; then
+ if reset_with_tcp_filter "multi subflows, with failing subflow" ns1 10.0.3.2 REJECT &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 0 2
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 REJECT
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
chk_join_nr 1 1 1
fi
# multiple subflows, with subflow timeout on MPJ
- if reset "multi subflows, with subflow timeout"; then
+ if reset_with_tcp_filter "multi subflows, with subflow timeout" ns1 10.0.3.2 DROP &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 0 2
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 DROP
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
chk_join_nr 1 1 1
fi
@@ -1999,11 +2025,11 @@ subflows_error_tests()
# multiple subflows, check that the endpoint corresponding to
# closed subflow (due to reset) is not reused if additional
# subflows are added later
- if reset "multi subflows, fair usage on close"; then
+ if reset_with_tcp_filter "multi subflows, fair usage on close" ns1 10.0.3.2 REJECT &&
+ continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
pm_nl_set_limits $ns1 0 1
pm_nl_set_limits $ns2 0 1
pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
- filter_tcp_from $ns1 10.0.3.2 REJECT
run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
# mpj subflow will be in TW after the reset
commit eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports")
...was added on the observation of CXL memory not being accessible after
setting up a region on a "cold-plugged" device. A "cold-plugged" CXL
device is one that was not present at boot, so platform-firmware/BIOS
has no chance to set it up.
While it is true that the debug found the enable bit clear in the
host-bridge's instance of the global control register (CXL 3.0
8.2.4.19.2 CXL HDM Decoder Global Control Register), that bit is
described as:
"This bit is only applicable to CXL.mem devices and shall
return 0 on CXL Host Bridges and Upstream Switch Ports."
So it is meant to be zero, and further testing confirmed that this "fix"
had no effect on the failure. Revert it, and be more vigilant about
proposed fixes in the future. Since the original copied stable@, flag
this revert for stable@ as well.
Cc: <stable(a)vger.kernel.org>
Fixes: eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/cxl/core/pci.c | 27 ++++-----------------------
drivers/cxl/cxl.h | 1 -
drivers/cxl/port.c | 14 +++++---------
tools/testing/cxl/Kbuild | 1 -
tools/testing/cxl/test/mock.c | 15 ---------------
5 files changed, 9 insertions(+), 49 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 7440f84be6c8..552203c13b39 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -308,36 +308,17 @@ static void disable_hdm(void *_cxlhdm)
hdm + CXL_HDM_DECODER_CTRL_OFFSET);
}
-int devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm)
+static int devm_cxl_enable_hdm(struct device *host, struct cxl_hdm *cxlhdm)
{
- void __iomem *hdm;
+ void __iomem *hdm = cxlhdm->regs.hdm_decoder;
u32 global_ctrl;
- /*
- * If the hdm capability was not mapped there is nothing to enable and
- * the caller is responsible for what happens next. For example,
- * emulate a passthrough decoder.
- */
- if (IS_ERR(cxlhdm))
- return 0;
-
- hdm = cxlhdm->regs.hdm_decoder;
global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
-
- /*
- * If the HDM decoder capability was enabled on entry, skip
- * registering disable_hdm() since this decode capability may be
- * owned by platform firmware.
- */
- if (global_ctrl & CXL_HDM_DECODER_ENABLE)
- return 0;
-
writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
hdm + CXL_HDM_DECODER_CTRL_OFFSET);
- return devm_add_action_or_reset(&port->dev, disable_hdm, cxlhdm);
+ return devm_add_action_or_reset(host, disable_hdm, cxlhdm);
}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_enable_hdm, CXL);
int cxl_dvsec_rr_decode(struct device *dev, int d,
struct cxl_endpoint_dvsec_info *info)
@@ -511,7 +492,7 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
if (info->mem_enabled)
return 0;
- rc = devm_cxl_enable_hdm(port, cxlhdm);
+ rc = devm_cxl_enable_hdm(&port->dev, cxlhdm);
if (rc)
return rc;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 74548f8f5f4c..d743df66a582 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -717,7 +717,6 @@ struct cxl_endpoint_dvsec_info {
struct cxl_hdm;
struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info);
-int devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm);
int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
int devm_cxl_add_passthrough_decoder(struct cxl_port *port);
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 5ffe3c7d2f5e..43718d0396d7 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -60,17 +60,13 @@ static int discover_region(struct device *dev, void *root)
static int cxl_switch_port_probe(struct cxl_port *port)
{
struct cxl_hdm *cxlhdm;
- int rc, nr_dports;
-
- nr_dports = devm_cxl_port_enumerate_dports(port);
- if (nr_dports < 0)
- return nr_dports;
+ int rc;
- cxlhdm = devm_cxl_setup_hdm(port, NULL);
- rc = devm_cxl_enable_hdm(port, cxlhdm);
- if (rc)
+ rc = devm_cxl_port_enumerate_dports(port);
+ if (rc < 0)
return rc;
+ cxlhdm = devm_cxl_setup_hdm(port, NULL);
if (!IS_ERR(cxlhdm))
return devm_cxl_enumerate_decoders(cxlhdm, NULL);
@@ -79,7 +75,7 @@ static int cxl_switch_port_probe(struct cxl_port *port)
return PTR_ERR(cxlhdm);
}
- if (nr_dports == 1) {
+ if (rc == 1) {
dev_dbg(&port->dev, "Fallback to passthrough decoder\n");
return devm_cxl_add_passthrough_decoder(port);
}
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 6f9347ade82c..fba7bec96acd 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -6,7 +6,6 @@ ldflags-y += --wrap=acpi_pci_find_root
ldflags-y += --wrap=nvdimm_bus_register
ldflags-y += --wrap=devm_cxl_port_enumerate_dports
ldflags-y += --wrap=devm_cxl_setup_hdm
-ldflags-y += --wrap=devm_cxl_enable_hdm
ldflags-y += --wrap=devm_cxl_add_passthrough_decoder
ldflags-y += --wrap=devm_cxl_enumerate_decoders
ldflags-y += --wrap=cxl_await_media_ready
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index 284416527644..de3933a776fd 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -149,21 +149,6 @@ struct cxl_hdm *__wrap_devm_cxl_setup_hdm(struct cxl_port *port,
}
EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_setup_hdm, CXL);
-int __wrap_devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm)
-{
- int index, rc;
- struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
-
- if (ops && ops->is_mock_port(port->uport))
- rc = 0;
- else
- rc = devm_cxl_enable_hdm(port, cxlhdm);
- put_cxl_mock_ops(index);
-
- return rc;
-}
-EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_enable_hdm, CXL);
-
int __wrap_devm_cxl_add_passthrough_decoder(struct cxl_port *port)
{
int rc, index;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b631e3a4e94c77c9007d60b577a069c203ce9594
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062206-muzzle-pope-0802@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b631e3a4e94c77c9007d60b577a069c203ce9594 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Thu, 8 Jun 2023 18:38:53 +0200
Subject: [PATCH] selftests: mptcp: sockopt: skip TCP_INQ checks if not
supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is TCP_INQ cmsg support introduced in commit 2c9e77659a0c
("mptcp: add TCP_INQ cmsg support").
It is possible to look for "mptcp_ioctl" in kallsyms because it was
needed to introduce the mentioned feature. We can skip these tests and
not set TCPINQ option if the feature is not supported.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 5cbd886ce2a9 ("selftests: mptcp: add TCP_INQ support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
index 1d4ae8792227..f295a371ff14 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
@@ -187,9 +187,14 @@ do_transfer()
local_addr="0.0.0.0"
fi
+ cmsg="TIMESTAMPNS"
+ if mptcp_lib_kallsyms_has "mptcp_ioctl$"; then
+ cmsg+=",TCPINQ"
+ fi
+
timeout ${timeout_test} \
ip netns exec ${listener_ns} \
- $mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s ${srv_proto} -c TIMESTAMPNS,TCPINQ \
+ $mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s ${srv_proto} -c "${cmsg}" \
${local_addr} < "$sin" > "$sout" &
local spid=$!
@@ -197,7 +202,7 @@ do_transfer()
timeout ${timeout_test} \
ip netns exec ${connector_ns} \
- $mptcp_connect -t ${timeout_poll} -M 2 -p $port -s ${cl_proto} -c TIMESTAMPNS,TCPINQ \
+ $mptcp_connect -t ${timeout_poll} -M 2 -p $port -s ${cl_proto} -c "${cmsg}" \
$connect_addr < "$cin" > "$cout" &
local cpid=$!
@@ -313,6 +318,11 @@ do_tcpinq_tests()
{
local lret=0
+ if ! mptcp_lib_kallsyms_has "mptcp_ioctl$"; then
+ echo "INFO: TCP_INQ not supported: SKIP"
+ return
+ fi
+
local args
for args in "-t tcp" "-r tcp"; do
do_tcpinq_test $args
Sometimes I miss the stable announcements cos of the delights of our
corporate email setup, so add me to the 0th mail CC list.
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
---
I dunno how to test this, but touch wood I've not made a hames of
something trivial...
scripts/quilt-mail | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/scripts/quilt-mail b/scripts/quilt-mail
index 5eb70af702..7b77e2970d 100755
--- a/scripts/quilt-mail
+++ b/scripts/quilt-mail
@@ -177,7 +177,8 @@ CC_NAMES=("linux-kernel(a)vger\.kernel\.org"
"f\.fainelli(a)gmail\.com"
"sudipm\.mukherjee(a)gmail\.com"
"srw(a)sladewatkins\.net"
- "rwarsow(a)gmx\.de")
+ "rwarsow(a)gmx\.de"
+ "conor(a)kernel\.org")
#CC_LIST="stable(a)vger\.kernel\.org"
CC_LIST="patches(a)lists.linux.dev"
--
2.40.1
Ahoj!
Vytvořili jsme nástroj pro automatický sběr dat z výrobních stanic, který může zlepšit vaši práci a zvýšit efektivitu procesů.
Toto řešení se rychle instaluje a snadno se používá a poskytne vám cenné údaje o ukazatelích výkonu pro celý závod a jednotlivé stroje. Nástroj jasně ukazuje, kdy se stroj/linka zpomaluje, jaké má prostoje, a vy tak víte, kdy zareagovat.
Získáte cenné informace o stavu výroby a důvodech možných zpoždění. Díky schopnosti okamžitě reagovat na vznikající problémy a příznaky zvyšujete efektivitu své práce o několik desítek procent.
Chcete takové řešení zdarma otestovat ve vaší firmě?
Pozdravy
Michal Rmoutil
commit d8e45bf1aed2 upstream.
(selftests/mount_setattr: fix redefine struct mount_attr build error)
Backport this commit from v6.2.0-rc5 to v6.3, v6.1, and v5.15 to resolve
the struct redefinition error:
mount_setattr_test.c:107:8: error: redefinition of 'struct mount_attr'
107 | struct mount_attr {
| ^~~~~~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/mount.h:32,
from mount_setattr_test.c:10:
../../../../usr/include/linux/mount.h:129:8: note: originally defined here
129 | struct mount_attr {
This error is caused by the upstream commit f1594bc67657
(selftests mount: Fix mount_setattr_test builds failed) backported
to v5.15
Thanks,
Hardik
----- Forwarded message from "Michael S. Tsirkin" <mst(a)redhat.com> -----
From: "Michael S. Tsirkin" <mst(a)redhat.com>
Date: Fri, 9 Jun 2023 03:27:28 -0400
To: linux-kernel(a)vger.kernel.org
Cc: kernel test robot <lkp(a)intel.com>, Suwan Kim <suwan.kim027(a)gmail.com>, "Roberts, Martin" <martin.roberts(a)intel.com>, Jason Wang
<jasowang(a)redhat.com>, Paolo Bonzini <pbonzini(a)redhat.com>, Stefan Hajnoczi <stefanha(a)redhat.com>, Xuan Zhuo
<xuanzhuo(a)linux.alibaba.com>, Jens Axboe <axboe(a)kernel.dk>, virtualization(a)lists.linux-foundation.org,
linux-block(a)vger.kernel.org
Subject: [PATCH v2] Revert "virtio-blk: support completion batching for the IRQ path"
Message-ID: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst(a)redhat.com>
This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp
i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.
After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.
'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...
Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0 vq=0 1 2 3
[a]request_count - 839416590 813148916 105586179 84988123
[b]completion1_count - 839416590 813148916 105586179 84988123
[c]completion2_count - 0 0 0 0
PF= 1 vq=0 1 2 3
[a]request_count - 823335887 812516140 104582672 75856549
[b]completion1_count - 823335887 812516140 104582672 75856549
[c]completion2_count - 0 0 0 0
i.e. the issue is after the virtio-blk driver.
This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
blk_mq_complete_request(req);
}
kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
virtblk_request_done(req); //force this here...
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
Perhaps you might like to fix/test/revert this change...
Martin
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027(a)gmail.com>
Reported-by: "Roberts, Martin" <martin.roberts(a)intel.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
---
Since v1:
fix build error
Still completely untested as I'm traveling.
Martin, Suwan, could you please test and report?
Suwan if you have a better revert in mind pls post and
I will be happy to drop this.
Thanks!
drivers/block/virtio_blk.c | 82 +++++++++++++++++---------------------
1 file changed, 37 insertions(+), 45 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2b918e28acaa..b47358da92a2 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req)
blk_mq_end_request(req, status);
}
-static void virtblk_complete_batch(struct io_comp_batch *iob)
-{
- struct request *req;
-
- rq_list_for_each(&iob->req_list, req) {
- virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
- virtblk_cleanup_cmd(req);
- }
- blk_mq_end_request_batch(iob);
-}
-
-static int virtblk_handle_req(struct virtio_blk_vq *vq,
- struct io_comp_batch *iob)
-{
- struct virtblk_req *vbr;
- int req_done = 0;
- unsigned int len;
-
- while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
- struct request *req = blk_mq_rq_from_pdu(vbr);
-
- if (likely(!blk_should_fake_timeout(req->q)) &&
- !blk_mq_complete_request_remote(req) &&
- !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
- virtblk_complete_batch))
- virtblk_request_done(req);
- req_done++;
- }
-
- return req_done;
-}
-
static void virtblk_done(struct virtqueue *vq)
{
struct virtio_blk *vblk = vq->vdev->priv;
- struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index];
- int req_done = 0;
+ bool req_done = false;
+ int qid = vq->index;
+ struct virtblk_req *vbr;
unsigned long flags;
- DEFINE_IO_COMP_BATCH(iob);
+ unsigned int len;
- spin_lock_irqsave(&vblk_vq->lock, flags);
+ spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
do {
virtqueue_disable_cb(vq);
- req_done += virtblk_handle_req(vblk_vq, &iob);
+ while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
+ struct request *req = blk_mq_rq_from_pdu(vbr);
+ if (likely(!blk_should_fake_timeout(req->q)))
+ blk_mq_complete_request(req);
+ req_done = true;
+ }
if (unlikely(virtqueue_is_broken(vq)))
break;
} while (!virtqueue_enable_cb(vq));
- if (req_done) {
- if (!rq_list_empty(iob.req_list))
- iob.complete(&iob);
-
- /* In case queue is stopped waiting for more buffers. */
+ /* In case queue is stopped waiting for more buffers. */
+ if (req_done)
blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
- }
- spin_unlock_irqrestore(&vblk_vq->lock, flags);
+ spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
}
static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
@@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set)
}
}
+static void virtblk_complete_batch(struct io_comp_batch *iob)
+{
+ struct request *req;
+
+ rq_list_for_each(&iob->req_list, req) {
+ virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
+ virtblk_cleanup_cmd(req);
+ }
+ blk_mq_end_request_batch(iob);
+}
+
static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
{
struct virtio_blk *vblk = hctx->queue->queuedata;
struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
+ struct virtblk_req *vbr;
unsigned long flags;
+ unsigned int len;
int found = 0;
spin_lock_irqsave(&vq->lock, flags);
- found = virtblk_handle_req(vq, iob);
+
+ while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
+ struct request *req = blk_mq_rq_from_pdu(vbr);
+
+ found++;
+ if (!blk_mq_complete_request_remote(req) &&
+ !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
+ virtblk_complete_batch))
+ virtblk_request_done(req);
+ }
if (found)
blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
--
MST
----- End forwarded message -----
Patch 1 correctly handles disconnect() failures that can happen in some
specific cases: now the socket state is set as unconnected as expected.
That fixes an issue introduced in v6.2.
Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar
to a recent one from Eric Dumazet for TCP introducing sk_wait_pending
flag. It should address an issue present in MPTCP from almost the
beginning, from v5.9.
Patch 3 fixes a possible list corruption on passive MPJ even if the race
seems very unlikely, better be safe than sorry. The possible issue is
present from v5.17.
Patch 4 consolidates fallback and non fallback state machines to avoid
leaking some MPTCP sockets. The fix is likely needed for versions from
v5.11.
Patch 5 drops code that is no longer used after the introduction of
patch 4/6. This is not really a fix but this patch can probably land in
the -net tree as well not to leave unused code.
Patch 6 ensures listeners are unhashed before updating their sk status
to avoid possible deadlocks when diag info are going to be retrieved
with a lock. Even if it should not be visible with the way we are
currently getting diag info, the issue is present from v5.17.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Paolo Abeni (6):
mptcp: handle correctly disconnect() failures
mptcp: fix possible divide by zero in recvmsg()
mptcp: fix possible list corruption on passive MPJ
mptcp: consolidate fallback and non fallback state machine
mptcp: drop legacy code around RX EOF
mptcp: ensure listener is unhashed before updating the sk status
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.c | 160 ++++++++++++++++++++-----------------------------
net/mptcp/protocol.h | 5 +-
net/mptcp/subflow.c | 17 +++---
4 files changed, 76 insertions(+), 107 deletions(-)
---
base-commit: 9a43827e876c9a071826cc81783aa2222b020f1d
change-id: 20230620-upstream-net-20230620-misc-fixes-for-v6-4-55ef43802324
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Currently, it is possible for us to access memory that we shouldn't.
Since, we acquire (possibly dangling) pointers to dirty rectangles
before doing a bounds check to make sure we can actually accommodate the
number of dirty rectangles userspace has requested to fill. This issue
is especially evident if a compositor requests both MPO and damage clips
at the same time, in which case I have observed a soft-hang. So, to
avoid this issue, perform the bounds check before filling a single dirty
rectangle and WARN() about it, if it is ever attempted in
fill_dc_dirty_rect().
Cc: stable(a)vger.kernel.org # 6.1+
Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 64b8dcf8dbda..66bb03d503ea 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5065,11 +5065,7 @@ static inline void fill_dc_dirty_rect(struct drm_plane *plane,
s32 y, s32 width, s32 height,
int *i, bool ffu)
{
- if (*i > DC_MAX_DIRTY_RECTS)
- return;
-
- if (*i == DC_MAX_DIRTY_RECTS)
- goto out;
+ WARN_ON(*i >= DC_MAX_DIRTY_RECTS);
dirty_rect->x = x;
dirty_rect->y = y;
@@ -5085,7 +5081,6 @@ static inline void fill_dc_dirty_rect(struct drm_plane *plane,
"[PLANE:%d] PSR SU dirty rect at (%d, %d) size (%d, %d)",
plane->base.id, x, y, width, height);
-out:
(*i)++;
}
@@ -5172,6 +5167,9 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
*dirty_regions_changed = bb_changed;
+ if ((num_clips + (bb_changed ? 2 : 0)) > DC_MAX_DIRTY_RECTS)
+ goto ffu;
+
if (bb_changed) {
fill_dc_dirty_rect(new_plane_state->plane, &dirty_rects[i],
new_plane_state->crtc_x,
@@ -5201,9 +5199,6 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
new_plane_state->crtc_h, &i, false);
}
- if (i > DC_MAX_DIRTY_RECTS)
- goto ffu;
-
flip_addrs->dirty_rect_count = i;
return;
--
2.40.1
The restoring of TPIDR2 signal context has been broken since it was
merged, fix this and add a test case covering it. This is a result of
TPIDR2 context management following a different flow to any of the other
state that we provide and the fact that we don't expose TPIDR (which
follows the same pattern) to signals.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
arm64/signal: Restore TPIDR2 register rather than memory state
kselftest/arm64: Add a test case for TPIDR2 restore
arch/arm64/kernel/signal.c | 2 +-
tools/testing/selftests/arm64/signal/.gitignore | 2 +-
.../arm64/signal/testcases/tpidr2_restore.c | 85 ++++++++++++++++++++++
3 files changed, 87 insertions(+), 2 deletions(-)
---
base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375
change-id: 20230621-arm64-fix-tpidr2-signal-restore-713d93798f99
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 306320034e8fbe7ee1cc4f5269c55658b4612048
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023061830-rubbed-stubble-2775@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 306320034e8fbe7ee1cc4f5269c55658b4612048 Mon Sep 17 00:00:00 2001
From: Bernhard Seibold <mail(a)bernhard-seibold.de>
Date: Fri, 2 Jun 2023 15:30:29 +0200
Subject: [PATCH] serial: lantiq: add missing interrupt ack
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, the error interrupt is never acknowledged, so once active it
will stay active indefinitely, causing the handler to be called in an
infinite loop.
Fixes: 2f0fc4159a6a ("SERIAL: Lantiq: Add driver for MIPS Lantiq SOCs.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Bernhard Seibold <mail(a)bernhard-seibold.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Message-ID: <20230602133029.546-1-mail(a)bernhard-seibold.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/lantiq.c b/drivers/tty/serial/lantiq.c
index a58e9277dfad..f1387f1024db 100644
--- a/drivers/tty/serial/lantiq.c
+++ b/drivers/tty/serial/lantiq.c
@@ -250,6 +250,7 @@ lqasc_err_int(int irq, void *_port)
struct ltq_uart_port *ltq_port = to_ltq_uart_port(port);
spin_lock_irqsave(<q_port->lock, flags);
+ __raw_writel(ASC_IRNCR_EIR, port->membase + LTQ_ASC_IRNCR);
/* clear any pending interrupts */
asc_update_bits(0, ASCWHBSTATE_CLRPE | ASCWHBSTATE_CLRFE |
ASCWHBSTATE_CLRROE, port->membase + LTQ_ASC_WHBSTATE);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5005bcb4219156f1bf7587b185080ec1da08518e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062114-jet-underwire-9543@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5005bcb4219156f1bf7587b185080ec1da08518e Mon Sep 17 00:00:00 2001
From: Namjae Jeon <linkinjeon(a)kernel.org>
Date: Thu, 15 Jun 2023 22:05:29 +0900
Subject: [PATCH] ksmbd: validate session id and tree id in the compound
request
This patch validate session id and tree id in compound request.
If first operation in the compound is SMB2 ECHO request, ksmbd bypass
session and tree validation. So work->sess and work->tcon could be NULL.
If secound request in the compound access work->sess or tcon, It cause
NULL pointer dereferecing error.
Cc: stable(a)vger.kernel.org
Reported-by: zdi-disclosures(a)trendmicro.com # ZDI-CAN-21165
Signed-off-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/server.c b/fs/smb/server/server.c
index f9b2e0f19b03..ced7a9e916f0 100644
--- a/fs/smb/server/server.c
+++ b/fs/smb/server/server.c
@@ -185,24 +185,31 @@ static void __handle_ksmbd_work(struct ksmbd_work *work,
goto send;
}
- if (conn->ops->check_user_session) {
- rc = conn->ops->check_user_session(work);
- if (rc < 0) {
- command = conn->ops->get_cmd_val(work);
- conn->ops->set_rsp_status(work,
- STATUS_USER_SESSION_DELETED);
- goto send;
- } else if (rc > 0) {
- rc = conn->ops->get_ksmbd_tcon(work);
+ do {
+ if (conn->ops->check_user_session) {
+ rc = conn->ops->check_user_session(work);
if (rc < 0) {
- conn->ops->set_rsp_status(work,
- STATUS_NETWORK_NAME_DELETED);
+ if (rc == -EINVAL)
+ conn->ops->set_rsp_status(work,
+ STATUS_INVALID_PARAMETER);
+ else
+ conn->ops->set_rsp_status(work,
+ STATUS_USER_SESSION_DELETED);
goto send;
+ } else if (rc > 0) {
+ rc = conn->ops->get_ksmbd_tcon(work);
+ if (rc < 0) {
+ if (rc == -EINVAL)
+ conn->ops->set_rsp_status(work,
+ STATUS_INVALID_PARAMETER);
+ else
+ conn->ops->set_rsp_status(work,
+ STATUS_NETWORK_NAME_DELETED);
+ goto send;
+ }
}
}
- }
- do {
rc = __process_request(work, conn, &command);
if (rc == SERVER_HANDLER_ABORT)
break;
diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index ccecdb71d2bc..da1787c68ba0 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -91,7 +91,6 @@ int smb2_get_ksmbd_tcon(struct ksmbd_work *work)
unsigned int cmd = le16_to_cpu(req_hdr->Command);
int tree_id;
- work->tcon = NULL;
if (cmd == SMB2_TREE_CONNECT_HE ||
cmd == SMB2_CANCEL_HE ||
cmd == SMB2_LOGOFF_HE) {
@@ -105,10 +104,28 @@ int smb2_get_ksmbd_tcon(struct ksmbd_work *work)
}
tree_id = le32_to_cpu(req_hdr->Id.SyncId.TreeId);
+
+ /*
+ * If request is not the first in Compound request,
+ * Just validate tree id in header with work->tcon->id.
+ */
+ if (work->next_smb2_rcv_hdr_off) {
+ if (!work->tcon) {
+ pr_err("The first operation in the compound does not have tcon\n");
+ return -EINVAL;
+ }
+ if (work->tcon->id != tree_id) {
+ pr_err("tree id(%u) is different with id(%u) in first operation\n",
+ tree_id, work->tcon->id);
+ return -EINVAL;
+ }
+ return 1;
+ }
+
work->tcon = ksmbd_tree_conn_lookup(work->sess, tree_id);
if (!work->tcon) {
pr_err("Invalid tid %d\n", tree_id);
- return -EINVAL;
+ return -ENOENT;
}
return 1;
@@ -547,7 +564,6 @@ int smb2_check_user_session(struct ksmbd_work *work)
unsigned int cmd = conn->ops->get_cmd_val(work);
unsigned long long sess_id;
- work->sess = NULL;
/*
* SMB2_ECHO, SMB2_NEGOTIATE, SMB2_SESSION_SETUP command do not
* require a session id, so no need to validate user session's for
@@ -558,15 +574,33 @@ int smb2_check_user_session(struct ksmbd_work *work)
return 0;
if (!ksmbd_conn_good(conn))
- return -EINVAL;
+ return -EIO;
sess_id = le64_to_cpu(req_hdr->SessionId);
+
+ /*
+ * If request is not the first in Compound request,
+ * Just validate session id in header with work->sess->id.
+ */
+ if (work->next_smb2_rcv_hdr_off) {
+ if (!work->sess) {
+ pr_err("The first operation in the compound does not have sess\n");
+ return -EINVAL;
+ }
+ if (work->sess->id != sess_id) {
+ pr_err("session id(%llu) is different with the first operation(%lld)\n",
+ sess_id, work->sess->id);
+ return -EINVAL;
+ }
+ return 1;
+ }
+
/* Check for validity of user session */
work->sess = ksmbd_session_lookup_all(conn, sess_id);
if (work->sess)
return 1;
ksmbd_debug(SMB, "Invalid user session, Uid %llu\n", sess_id);
- return -EINVAL;
+ return -ENOENT;
}
static void destroy_previous_session(struct ksmbd_conn *conn,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5fe7f7b78290638806211046a99f031ff26164e1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062136-xbox-tidal-465b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5fe7f7b78290638806211046a99f031ff26164e1 Mon Sep 17 00:00:00 2001
From: Namjae Jeon <linkinjeon(a)kernel.org>
Date: Thu, 15 Jun 2023 22:04:40 +0900
Subject: [PATCH] ksmbd: fix out-of-bound read in smb2_write
ksmbd_smb2_check_message doesn't validate hdr->NextCommand. If
->NextCommand is bigger than Offset + Length of smb2 write, It will
allow oversized smb2 write length. It will cause OOB read in smb2_write.
Cc: stable(a)vger.kernel.org
Reported-by: zdi-disclosures(a)trendmicro.com # ZDI-CAN-21164
Signed-off-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb2misc.c b/fs/smb/server/smb2misc.c
index 57749f41b991..33b7e6c4ceff 100644
--- a/fs/smb/server/smb2misc.c
+++ b/fs/smb/server/smb2misc.c
@@ -351,10 +351,16 @@ int ksmbd_smb2_check_message(struct ksmbd_work *work)
int command;
__u32 clc_len; /* calculated length */
__u32 len = get_rfc1002_len(work->request_buf);
- __u32 req_struct_size;
+ __u32 req_struct_size, next_cmd = le32_to_cpu(hdr->NextCommand);
- if (le32_to_cpu(hdr->NextCommand) > 0)
- len = le32_to_cpu(hdr->NextCommand);
+ if ((u64)work->next_smb2_rcv_hdr_off + next_cmd > len) {
+ pr_err("next command(%u) offset exceeds smb msg size\n",
+ next_cmd);
+ return 1;
+ }
+
+ if (next_cmd > 0)
+ len = next_cmd;
else if (work->next_smb2_rcv_hdr_off)
len -= work->next_smb2_rcv_hdr_off;
Hi,
ea2062dd1f03 ("drm/amd/display: fix the system hang while disable PSR")
was tagged for stable, but failed to apply to 6.3.y, 6.1.y and 5.15.y.
I've looked into the missing dependencies, and here are the dependencies
needed for the stable backport:
5.15.y:
-------
97ca308925a5 ("drm/amd/display: Add minimal pipe split transition state")
f7511289821f ("drm/amd/display: Use dc_update_planes_and_stream")
81f743a08f3b ("drm/amd/display: Add wrapper to call planes and stream
update")
ea2062dd1f03 ("drm/amd/display: fix the system hang while disable PSR")
6.1.y / 6.3.y
-------------
ea2062dd1f03 ("drm/amd/display: fix the system hang while disable PSR")
f7511289821f ("drm/amd/display: Use dc_update_planes_and_stream")
81f743a08f3b ("drm/amd/display: Add wrapper to call planes and stream
update")
ea2062dd1f03 ("drm/amd/display: fix the system hang while disable PSR")
Thanks!
Hi,
This patch fixes a deadlock that can render a system frozen when
reading user memory from BPF.
Ideally, it should be applied to any supported revision equal to or
greater than 5.19.
patch subject: mm: Fix copy_from_user_nofault().
git revision: d319f344561de23e810515d109c7278919bff7b0
Thanks,
--
Fco. Javier Honduvilla Coto
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
commit e18eb8783ec4949adebc7d7b0fdb65f65bfeefd9 upstream.
Currently the tracing_reset_all_online_cpus() requires the
trace_types_lock held. But only one caller of this function actually has
that lock held before calling it, and the other just takes the lock so
that it can call it. More users of this function is needed where the lock
is not held.
Add a tracing_reset_all_online_cpus_unlocked() function for the one use
case that calls it without being held, and also add a lockdep_assert to
make sure it is held when called.
Then have tracing_reset_all_online_cpus() take the lock internally, such
that callers do not need to worry about taking it.
Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
[Refers to commit message of 417d5ea6e735e5d88ffb6c436cf2938f3f476dd1,
this patch is pre-depended, and tracing_reset_all_online_cpus() should
be called after trace_types_lock is held as its comment describes.]
Fixes: 417d5ea6e735 ("tracing: Free buffers when a used dynamic event is removed")
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
---
kernel/trace/trace.c | 11 ++++++++++-
kernel/trace/trace.h | 1 +
kernel/trace/trace_events.c | 2 +-
kernel/trace/trace_events_synth.c | 2 --
4 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f8b5abd6e36c..0202f23ae960 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2175,10 +2175,12 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
}
/* Must have trace_types_lock held */
-void tracing_reset_all_online_cpus(void)
+void tracing_reset_all_online_cpus_unlocked(void)
{
struct trace_array *tr;
+ lockdep_assert_held(&trace_types_lock);
+
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
if (!tr->clear_trace)
continue;
@@ -2190,6 +2192,13 @@ void tracing_reset_all_online_cpus(void)
}
}
+void tracing_reset_all_online_cpus(void)
+{
+ mutex_lock(&trace_types_lock);
+ tracing_reset_all_online_cpus_unlocked();
+ mutex_unlock(&trace_types_lock);
+}
+
/*
* The tgid_map array maps from pid to tgid; i.e. the value stored at index i
* is the tgid last observed corresponding to pid=i.
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 66b6c8395fbc..2c3d9b6ce148 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -580,6 +580,7 @@ int tracing_is_enabled(void);
void tracing_reset_online_cpus(struct array_buffer *buf);
void tracing_reset_current(int cpu);
void tracing_reset_all_online_cpus(void);
+void tracing_reset_all_online_cpus_unlocked(void);
int tracing_open_generic(struct inode *inode, struct file *filp);
int tracing_open_generic_tr(struct inode *inode, struct file *filp);
bool tracing_is_disabled(void);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 1aadc9a6487b..160298d285c0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2974,7 +2974,7 @@ static void trace_module_remove_events(struct module *mod)
* over from this module may be passed to the new module events and
* unexpected results may occur.
*/
- tracing_reset_all_online_cpus();
+ tracing_reset_all_online_cpus_unlocked();
}
static int trace_module_notify(struct notifier_block *self,
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 2fdf3fd591e1..08c7df42ade7 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -1416,7 +1416,6 @@ int synth_event_delete(const char *event_name)
mutex_unlock(&event_mutex);
if (mod) {
- mutex_lock(&trace_types_lock);
/*
* It is safest to reset the ring buffer if the module
* being unloaded registered any events that were
@@ -1428,7 +1427,6 @@ int synth_event_delete(const char *event_name)
* occur.
*/
tracing_reset_all_online_cpus();
- mutex_unlock(&trace_types_lock);
}
return ret;
--
2.25.1
Please backport the following changes to the 4.18 stable kernel:
e1891cffd4c4 "ipmi: Make the smi watcher be disabled immediately when not needed"
383035211c79 "ipmi: move message error checking to avoid deadlock"
e1891cffd4c4 doesn't apply completely cleanly because of other changes,
but you just need to leave in the free_user_work() function and delete
the other function in the conflict. I can also supply a patch if
necessary.
Change
b4a34aa6d "ipmi: Fix how the lower layers are told to watch for messages"
was backported to fullfill a dependency for another backport, but there
was another change:
e1891cffd4c4 "ipmi: Make the smi watcher be disabled immediately when not needed"
That is needed to avoid calling a lower layer function with
xmit_msgs_lock held. In addition to that, you will also need:
383035211c79 "ipmi: move message error checking to avoid deadlock"
to fix a bug in that change.
e1891cffd4c4 came in 5.1 and 383035211c79 came in 5.4 (and I believe was
backported) so everything should be good for 5.4 and later. b4a34aa6d
was not backported to 4.14, so it is also ok. So 4.19 is the only
kernel that needs the change.
Thanks to Janne Huttunen for quick work on this.
-corey
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 97b6b9cbba40a21c1d9a344d5c1991f8cfbf136e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023061700-surplus-art-1fef@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 97b6b9cbba40a21c1d9a344d5c1991f8cfbf136e Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Fri, 19 May 2023 16:47:37 +0200
Subject: [PATCH] x86/purgatory: remove PGO flags
If profile-guided optimization is enabled, the purgatory ends up with
multiple .text sections. This is not supported by kexec and crashes the
system.
Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-2-b05c520b7296@chromium…
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Palmer Dabbelt <palmer(a)rivosinc.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: Ross Zwisler <zwisler(a)google.com>
Cc: Simon Horman <horms(a)kernel.org>
Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Tom Rix <trix(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 82fec66d46d2..42abd6af1198 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -14,6 +14,11 @@ $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
CFLAGS_sha256.o := -D__DISABLE_EXPORTS
+# When profile-guided optimization is enabled, llvm emits two different
+# overlapping text sections, which is not supported by kexec. Remove profile
+# optimization flags.
+KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS))
+
# When linking purgatory.ro with -r unresolved symbols are not checked,
# also link a purgatory.chk binary without -r to check for unresolved symbols.
PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib
commit 92c5d1b860e9581d64baca76779576c0ab0d943d upstream.
The current sanity check for nilfs2 geometry information lacks checks for
the number of segments stored in superblocks, so even for device images
that have been destructively truncated or have an unusually high number of
segments, the mount operation may succeed.
This causes out-of-bounds block I/O on file system block reads or log
writes to the segments, the latter in particular causing
"a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to
hang.
Fix this issue by checking the number of segments stored in the superblock
and avoiding mounting devices that can cause out-of-bounds accesses. To
eliminate the possibility of overflow when calculating the number of
blocks required for the device from the number of segments, this also adds
a helper function to calculate the upper bound on the number of segments
and inserts a check using it.
Link: https://lkml.kernel.org/r/20230526021332.3431-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+7d50f1e54a12ba3aeae2(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Please apply this patch to the above stable trees instead of the patch
that could not be applied to them. The hang issue reported by syzbot was
confirmed to reproduce on these stable kernels using its reproducer.
This fixes it.
In this patch, "sb_bdev_nr_blocks()" and "nilfs_err()" are replaced with
their equivalents since they don't yet exist in these kernels. With these
tweaks, this patch is applicable from v4.8 to v5.8. Also, this patch has
been tested against the title stable trees.
fs/nilfs2/the_nilfs.c | 44 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 24f626e7d012..d550a564645e 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -375,6 +375,18 @@ unsigned long nilfs_nrsvsegs(struct the_nilfs *nilfs, unsigned long nsegs)
100));
}
+/**
+ * nilfs_max_segment_count - calculate the maximum number of segments
+ * @nilfs: nilfs object
+ */
+static u64 nilfs_max_segment_count(struct the_nilfs *nilfs)
+{
+ u64 max_count = U64_MAX;
+
+ do_div(max_count, nilfs->ns_blocks_per_segment);
+ return min_t(u64, max_count, ULONG_MAX);
+}
+
void nilfs_set_nsegments(struct the_nilfs *nilfs, unsigned long nsegs)
{
nilfs->ns_nsegments = nsegs;
@@ -384,6 +396,8 @@ void nilfs_set_nsegments(struct the_nilfs *nilfs, unsigned long nsegs)
static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
struct nilfs_super_block *sbp)
{
+ u64 nsegments, nblocks;
+
if (le32_to_cpu(sbp->s_rev_level) < NILFS_MIN_SUPP_REV) {
nilfs_msg(nilfs->ns_sb, KERN_ERR,
"unsupported revision (superblock rev.=%d.%d, current rev.=%d.%d). Please check the version of mkfs.nilfs(2).",
@@ -430,7 +444,35 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
return -EINVAL;
}
- nilfs_set_nsegments(nilfs, le64_to_cpu(sbp->s_nsegments));
+ nsegments = le64_to_cpu(sbp->s_nsegments);
+ if (nsegments > nilfs_max_segment_count(nilfs)) {
+ nilfs_msg(nilfs->ns_sb, KERN_ERR,
+ "segment count %llu exceeds upper limit (%llu segments)",
+ (unsigned long long)nsegments,
+ (unsigned long long)nilfs_max_segment_count(nilfs));
+ return -EINVAL;
+ }
+
+ nblocks = (u64)i_size_read(nilfs->ns_sb->s_bdev->bd_inode) >>
+ nilfs->ns_sb->s_blocksize_bits;
+ if (nblocks) {
+ u64 min_block_count = nsegments * nilfs->ns_blocks_per_segment;
+ /*
+ * To avoid failing to mount early device images without a
+ * second superblock, exclude that block count from the
+ * "min_block_count" calculation.
+ */
+
+ if (nblocks < min_block_count) {
+ nilfs_msg(nilfs->ns_sb, KERN_ERR,
+ "total number of segment blocks %llu exceeds device size (%llu blocks)",
+ (unsigned long long)min_block_count,
+ (unsigned long long)nblocks);
+ return -EINVAL;
+ }
+ }
+
+ nilfs_set_nsegments(nilfs, nsegments);
nilfs->ns_crc_seed = le32_to_cpu(sbp->s_crc_seed);
return 0;
}
--
2.39.3
Dear Sir,
Need funding for your project or your business ? We are looking for
foreign direct investment partners in any of the sectors stated below and we are
willing to provide financing for up to US$ ten Billion to corporate
bodies, companies, industries and entrepreneurs with profitable
business ideas and investment projects that can generate the required
ROI, so you can draw from this opportunity. We are currently providing
funds in any of the sectors stated below. Energy & Power,
construction, Agriculture, Acquisitions, Healthcare or Hospital, Real
Estate, Oil & Gas, IT, technology, transport, mining,marine
transportation and manufacturing, Education, hotels, etc. We are
willing to finance your projects. We have developed a new funding
method that does not take longer to receive funding from our
customers. If you are seriously pursuing Foreign Direct Investment or
Joint Venture for your projects in any of the sectors above or are you
seeking a Loan to expand your Business or seeking funds to finance
your business or project ? We are willing to fund your business and we
would like you to provide us with your comprehensive business plan for
our team of investment experts to review. Kindly contact me with below
email: yousefahmedalgosaibi(a)consultant.com
Regards
Mr. Yousef Ahmed
If the core is left to remove the LEDs via devm_, it is performed too
late, after the PHY driver is removed from the PHY. This results in
dereferencing a NULL pointer when the LED core tries to turn the LED
off before destroying the LED.
Manually unregister the LEDs at a safe point in phy_remove.
Cc: stable(a)vger.kernel.org
Reported-by: Florian Fainelli <f.fainelli(a)gmail.com>
Suggested-by: Florian Fainelli <f.fainelli(a)gmail.com>
Fixes: 01e5b728e9e4 ("net: phy: Add a binding for PHY LEDs")
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
---
drivers/net/phy/phy_device.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 17d0d0555a79..53598210be6c 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -3021,6 +3021,15 @@ static int phy_led_blink_set(struct led_classdev *led_cdev,
return err;
}
+static void phy_leds_unregister(struct phy_device *phydev)
+{
+ struct phy_led *phyled;
+
+ list_for_each_entry(phyled, &phydev->leds, list) {
+ led_classdev_unregister(&phyled->led_cdev);
+ }
+}
+
static int of_phy_led(struct phy_device *phydev,
struct device_node *led)
{
@@ -3054,7 +3063,7 @@ static int of_phy_led(struct phy_device *phydev,
init_data.fwnode = of_fwnode_handle(led);
init_data.devname_mandatory = true;
- err = devm_led_classdev_register_ext(dev, cdev, &init_data);
+ err = led_classdev_register_ext(dev, cdev, &init_data);
if (err)
return err;
@@ -3083,6 +3092,7 @@ static int of_phy_leds(struct phy_device *phydev)
err = of_phy_led(phydev, led);
if (err) {
of_node_put(led);
+ phy_leds_unregister(phydev);
return err;
}
}
@@ -3305,6 +3315,9 @@ static int phy_remove(struct device *dev)
cancel_delayed_work_sync(&phydev->state_queue);
+ if (IS_ENABLED(CONFIG_PHYLIB_LEDS))
+ phy_leds_unregister(phydev);
+
phydev->state = PHY_DOWN;
sfp_bus_del_upstream(phydev->sfp_bus);
--
2.40.1
Fix the test for the AST2200 in the DRAM initialization. The value
in ast->chip has to be compared against an enum constant instead of
a numerical value.
This bug got introduced when the driver was first imported into the
kernel.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: 312fec1405dd ("drm: Initial KMS driver for AST (ASpeed Technologies) 2000 series (v2)")
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v3.5+
Reviewed-by: Sui Jingfeng <suijingfeng(a)loongson.cn>
Reviewed-by: Jocelyn Falempe <jfalempe(a)redhat.com>
Tested-by: Jocelyn Falempe <jfalempe(a)redhat.com> # AST2600
---
drivers/gpu/drm/ast/ast_post.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ast/ast_post.c b/drivers/gpu/drm/ast/ast_post.c
index a005aec18a020..0262aaafdb1c5 100644
--- a/drivers/gpu/drm/ast/ast_post.c
+++ b/drivers/gpu/drm/ast/ast_post.c
@@ -291,7 +291,7 @@ static void ast_init_dram_reg(struct drm_device *dev)
;
} while (ast_read32(ast, 0x10100) != 0xa8);
} else {/* AST2100/1100 */
- if (ast->chip == AST2100 || ast->chip == 2200)
+ if (ast->chip == AST2100 || ast->chip == AST2200)
dram_reg_info = ast2100_dram_table_data;
else
dram_reg_info = ast1100_dram_table_data;
--
2.41.0
The 'qcom_swrm_ctrl->pconfig' has size of QCOM_SDW_MAX_PORTS (14),
however we index it starting from 1, not 0, to match real port numbers.
This can lead to writing port config past 'pconfig' bounds and
overwriting next member of 'qcom_swrm_ctrl' struct. Reported also by
smatch:
drivers/soundwire/qcom.c:1269 qcom_swrm_get_port_config() error: buffer overflow 'ctrl->pconfig' 14 <= 14
Fixes: 9916c02ccd74 ("soundwire: qcom: cleanup internal port config indexing")
Cc: <stable(a)vger.kernel.org>
Reported-by: kernel test robot <lkp(a)intel.com>
Reported-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/202305201301.sCJ8UDKV-lkp@intel.com/
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
drivers/soundwire/qcom.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c
index 7cb1b7eba814..88a772075907 100644
--- a/drivers/soundwire/qcom.c
+++ b/drivers/soundwire/qcom.c
@@ -202,7 +202,8 @@ struct qcom_swrm_ctrl {
u32 intr_mask;
u8 rcmd_id;
u8 wcmd_id;
- struct qcom_swrm_port_config pconfig[QCOM_SDW_MAX_PORTS];
+ /* Port numbers are 1 - 14 */
+ struct qcom_swrm_port_config pconfig[QCOM_SDW_MAX_PORTS + 1];
struct sdw_stream_runtime *sruntime[SWRM_MAX_DAIS];
enum sdw_slave_status status[SDW_MAX_DEVICES + 1];
int (*reg_read)(struct qcom_swrm_ctrl *ctrl, int reg, u32 *val);
--
2.34.1
From: Jim Wylder <jwylder(a)google.com>
[ Upstream commit 3981514180c987a79ea98f0ae06a7cbf58a9ac0f ]
Currently, when regmap_raw_write() splits the data, it uses the
max_raw_write value defined for the bus. For any bus that includes
the target register address in the max_raw_write value, the chunked
transmission will always exceed the maximum transmission length.
To avoid this problem, subtract the length of the register and the
padding from the maximum transmission.
Signed-off-by: Jim Wylder <jwylder(a)google.com
Link: https://lore.kernel.org/r/20230517152444.3690870-2-jwylder@google.com
Signed-off-by: Mark Brown <broonie(a)kernel.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/base/regmap/regmap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c
index 55a30afc14a00..2a3c3dfefdcec 100644
--- a/drivers/base/regmap/regmap.c
+++ b/drivers/base/regmap/regmap.c
@@ -1998,6 +1998,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg,
size_t val_count = val_len / val_bytes;
size_t chunk_count, chunk_bytes;
size_t chunk_regs = val_count;
+ size_t max_data = map->max_raw_write - map->format.reg_bytes -
+ map->format.pad_bytes;
int ret, i;
if (!val_count)
@@ -2005,8 +2007,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg,
if (map->use_single_write)
chunk_regs = 1;
- else if (map->max_raw_write && val_len > map->max_raw_write)
- chunk_regs = map->max_raw_write / val_bytes;
+ else if (map->max_raw_write && val_len > max_data)
+ chunk_regs = max_data / val_bytes;
chunk_count = val_count / chunk_regs;
chunk_bytes = chunk_regs * val_bytes;
--
2.39.2
Hi,
Some (consumer) Phoenix laptops are showing up on the market that don't
have discrete TPMs and are choosing not to enable the AMD security
processor firmware TPM (fTPM).
In these laptops they offer Pluton, and are relying upon Pluton for TPM
functionality.
This was introduced in kernel 6.3 with:
4d2732882703 ("tpm_crb: Add support for CRB devices based on Pluton")
I double checked with a backport of this to 6.1.y and at least basic TPM
functionality does work properly.
Could this be brought back to 6.1.y so that users with these laptops
have TPM working with the LTS kernel?
Thanks,
The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.
Use u64 as type for sector address and size to allow using disks up to
2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD
format allows to specify disk sizes up to 2^128 bytes (though native
OS limitations reduce this somewhat, to max 2^68 bytes), so check for
u64 overflow carefully to protect against overflowing sector_t.
Bail out if sector addresses overflow 32 bits on kernels without LBD
support.
This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow <jdow(a)earthlink.net>. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted (now resubmitted as patch 1 in this series).
This patch adds additional error checking and warning messages.
Reported-by: Martin Steigerwald <Martin(a)lichtvoll.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Message-ID: <201206192146.09327.Martin(a)lichtvoll.de>
Cc: <stable(a)vger.kernel.org> # 5.2
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch(a)infradead.org>
---
Changes from RFC:
- use u64 instead of sector_t, since that may be u32 without LBD support
- check multiplication overflows each step - 3 u32 values may exceed u64!
- warn against use on AmigaDOS if partition data overflow u32 sector count.
- warn if partition CylBlocks larger than what's stored in the RDSK header.
- bail out if we were to overflow sector_t (32 or 64 bit).
Changes from v1:
Kars de Jong:
- use defines for magic offsets in DosEnvec struct
Geert Uytterhoeven:
- use u64 cast for multiplications of u32 numbers
- use array3_size for overflow checks
- change pr_err to pr_warn
- discontinue use of pr_cont
- reword log messages
- drop redundant nr_sects overflow test
- warn against 32 bit overflow for each affected partition
- skip partitions that overflow sector_t size instead of aborting scan
Changes from v2:
- further trim 32 bit overflow test
- correct duplicate types.h inclusion introduced in v2
Changes from v3:
- split off sector address type fix for independent review
- change blksize to unsigned
- use check_mul_overflow() instead of array3_size()
- rewrite checks to avoid 64 bit divisions in check_mul_overflow()
Changes from v5:
Geert Uytterhoeven:
- correct ineffective u64 cast to avoid 32 bit mult. overflow
- fix mult. overflow in partition block address calculation
Changes from v6:
Geert Uytterhoeven:
- don't fail hard on partition block address overflow
Changes from v7:
- replace bdevname(state->bdev, b) by state->disk->disk_name
- drop warn_no_part conditionals
- remove remaining warn_no_part
Changes from v8:
Christoph Hellwig:
- whitespace fix, drop unnecessary u64 casts
kbuild warning:
- sparse warning fix
Changes from v9:
- revert ineffective sparse warning fix, and rely on
change to annotation of rdb_CylBlocks (patch 2 this
series) instead.
- add Fixes: tags and stable backport prereq
---
block/partitions/amiga.c | 103 ++++++++++++++++++++++++++++++++-------
1 file changed, 85 insertions(+), 18 deletions(-)
diff --git a/block/partitions/amiga.c b/block/partitions/amiga.c
index 85c5c79aae48..ed222b9c901b 100644
--- a/block/partitions/amiga.c
+++ b/block/partitions/amiga.c
@@ -11,10 +11,18 @@
#define pr_fmt(fmt) fmt
#include <linux/types.h>
+#include <linux/mm_types.h>
+#include <linux/overflow.h>
#include <linux/affs_hardblocks.h>
#include "check.h"
+/* magic offsets in partition DosEnvVec */
+#define NR_HD 3
+#define NR_SECT 5
+#define LO_CYL 9
+#define HI_CYL 10
+
static __inline__ u32
checksum_block(__be32 *m, int size)
{
@@ -31,9 +39,12 @@ int amiga_partition(struct parsed_partitions *state)
unsigned char *data;
struct RigidDiskBlock *rdb;
struct PartitionBlock *pb;
- sector_t start_sect, nr_sects;
- int blk, part, res = 0;
- int blksize = 1; /* Multiplier for disk block size */
+ u64 start_sect, nr_sects;
+ sector_t blk, end_sect;
+ u32 cylblk; /* rdb_CylBlocks = nr_heads*sect_per_track */
+ u32 nr_hd, nr_sect, lo_cyl, hi_cyl;
+ int part, res = 0;
+ unsigned int blksize = 1; /* Multiplier for disk block size */
int slot = 1;
for (blk = 0; ; blk++, put_dev_sector(sect)) {
@@ -41,7 +52,7 @@ int amiga_partition(struct parsed_partitions *state)
goto rdb_done;
data = read_part_sector(state, blk, §);
if (!data) {
- pr_err("Dev %s: unable to read RDB block %d\n",
+ pr_err("Dev %s: unable to read RDB block %llu\n",
state->disk->disk_name, blk);
res = -1;
goto rdb_done;
@@ -58,12 +69,12 @@ int amiga_partition(struct parsed_partitions *state)
*(__be32 *)(data+0xdc) = 0;
if (checksum_block((__be32 *)data,
be32_to_cpu(rdb->rdb_SummedLongs) & 0x7F)==0) {
- pr_err("Trashed word at 0xd0 in block %d ignored in checksum calculation\n",
+ pr_err("Trashed word at 0xd0 in block %llu ignored in checksum calculation\n",
blk);
break;
}
- pr_err("Dev %s: RDB in block %d has bad checksum\n",
+ pr_err("Dev %s: RDB in block %llu has bad checksum\n",
state->disk->disk_name, blk);
}
@@ -80,10 +91,15 @@ int amiga_partition(struct parsed_partitions *state)
blk = be32_to_cpu(rdb->rdb_PartitionList);
put_dev_sector(sect);
for (part = 1; blk>0 && part<=16; part++, put_dev_sector(sect)) {
- blk *= blksize; /* Read in terms partition table understands */
+ /* Read in terms partition table understands */
+ if (check_mul_overflow(blk, (sector_t) blksize, &blk)) {
+ pr_err("Dev %s: overflow calculating partition block %llu! Skipping partitions %u and beyond\n",
+ state->disk->disk_name, blk, part);
+ break;
+ }
data = read_part_sector(state, blk, §);
if (!data) {
- pr_err("Dev %s: unable to read partition block %d\n",
+ pr_err("Dev %s: unable to read partition block %llu\n",
state->disk->disk_name, blk);
res = -1;
goto rdb_done;
@@ -95,19 +111,70 @@ int amiga_partition(struct parsed_partitions *state)
if (checksum_block((__be32 *)pb, be32_to_cpu(pb->pb_SummedLongs) & 0x7F) != 0 )
continue;
- /* Tell Kernel about it */
+ /* RDB gives us more than enough rope to hang ourselves with,
+ * many times over (2^128 bytes if all fields max out).
+ * Some careful checks are in order, so check for potential
+ * overflows.
+ * We are multiplying four 32 bit numbers to one sector_t!
+ */
+
+ nr_hd = be32_to_cpu(pb->pb_Environment[NR_HD]);
+ nr_sect = be32_to_cpu(pb->pb_Environment[NR_SECT]);
+
+ /* CylBlocks is total number of blocks per cylinder */
+ if (check_mul_overflow(nr_hd, nr_sect, &cylblk)) {
+ pr_err("Dev %s: heads*sects %u overflows u32, skipping partition!\n",
+ state->disk->disk_name, cylblk);
+ continue;
+ }
+
+ /* check for consistency with RDB defined CylBlocks */
+ if (cylblk > be32_to_cpu(rdb->rdb_CylBlocks)) {
+ pr_warn("Dev %s: cylblk %u > rdb_CylBlocks %u!\n",
+ state->disk->disk_name, cylblk,
+ be32_to_cpu(rdb->rdb_CylBlocks));
+ }
+
+ /* RDB allows for variable logical block size -
+ * normalize to 512 byte blocks and check result.
+ */
+
+ if (check_mul_overflow(cylblk, blksize, &cylblk)) {
+ pr_err("Dev %s: partition %u bytes per cyl. overflows u32, skipping partition!\n",
+ state->disk->disk_name, part);
+ continue;
+ }
+
+ /* Calculate partition start and end. Limit of 32 bit on cylblk
+ * guarantees no overflow occurs if LBD support is enabled.
+ */
+
+ lo_cyl = be32_to_cpu(pb->pb_Environment[LO_CYL]);
+ start_sect = ((u64) lo_cyl * cylblk);
+
+ hi_cyl = be32_to_cpu(pb->pb_Environment[HI_CYL]);
+ nr_sects = (((u64) hi_cyl - lo_cyl + 1) * cylblk);
- nr_sects = ((sector_t)be32_to_cpu(pb->pb_Environment[10]) + 1 -
- be32_to_cpu(pb->pb_Environment[9])) *
- be32_to_cpu(pb->pb_Environment[3]) *
- be32_to_cpu(pb->pb_Environment[5]) *
- blksize;
if (!nr_sects)
continue;
- start_sect = (sector_t)be32_to_cpu(pb->pb_Environment[9]) *
- be32_to_cpu(pb->pb_Environment[3]) *
- be32_to_cpu(pb->pb_Environment[5]) *
- blksize;
+
+ /* Warn user if partition end overflows u32 (AmigaDOS limit) */
+
+ if ((start_sect + nr_sects) > UINT_MAX) {
+ pr_warn("Dev %s: partition %u (%llu-%llu) needs 64 bit device support!\n",
+ state->disk->disk_name, part,
+ start_sect, start_sect + nr_sects);
+ }
+
+ if (check_add_overflow(start_sect, nr_sects, &end_sect)) {
+ pr_err("Dev %s: partition %u (%llu-%llu) needs LBD device support, skipping partition!\n",
+ state->disk->disk_name, part,
+ start_sect, end_sect);
+ continue;
+ }
+
+ /* Tell Kernel about it */
+
put_partition(state,slot++,start_sect,nr_sects);
{
/* Be even more informative to aid mounting */
--
2.17.1
The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.
Use sector_t as type for sector address and size to allow using disks
up to 2 TB without LBD support, and disks larger than 2 TB with LBD.
This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow <jdow(a)earthlink.net>. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted. This patch differs from Joanne's patch only in its use of
sector_t instead of unsigned int. No checking for overflows is done
(see patch 3 of this series for that).
Reported-by: Martin Steigerwald <Martin(a)lichtvoll.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Message-ID: <201206192146.09327.Martin(a)lichtvoll.de>
Cc: <stable(a)vger.kernel.org> # 5.2
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Tested-by: Martin Steigerwald <Martin(a)lichtvoll.de>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
---
Changes from v3:
- split off change of sector address type as quick fix.
- cast to sector_t in sector address calculations.
- move overflow checking to separate patch for more thorough review.
Changes from v4:
Andreas Schwab:
- correct cast to sector_t in sector address calculations
Changes from v7:
Christoph Hellwig
- correct style issues
Changes from v9:
- add Fixes: tags and stable backport prereq
---
block/partitions/amiga.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/block/partitions/amiga.c b/block/partitions/amiga.c
index 5c8624e26a54..85c5c79aae48 100644
--- a/block/partitions/amiga.c
+++ b/block/partitions/amiga.c
@@ -31,7 +31,8 @@ int amiga_partition(struct parsed_partitions *state)
unsigned char *data;
struct RigidDiskBlock *rdb;
struct PartitionBlock *pb;
- int start_sect, nr_sects, blk, part, res = 0;
+ sector_t start_sect, nr_sects;
+ int blk, part, res = 0;
int blksize = 1; /* Multiplier for disk block size */
int slot = 1;
@@ -96,14 +97,14 @@ int amiga_partition(struct parsed_partitions *state)
/* Tell Kernel about it */
- nr_sects = (be32_to_cpu(pb->pb_Environment[10]) + 1 -
- be32_to_cpu(pb->pb_Environment[9])) *
+ nr_sects = ((sector_t)be32_to_cpu(pb->pb_Environment[10]) + 1 -
+ be32_to_cpu(pb->pb_Environment[9])) *
be32_to_cpu(pb->pb_Environment[3]) *
be32_to_cpu(pb->pb_Environment[5]) *
blksize;
if (!nr_sects)
continue;
- start_sect = be32_to_cpu(pb->pb_Environment[9]) *
+ start_sect = (sector_t)be32_to_cpu(pb->pb_Environment[9]) *
be32_to_cpu(pb->pb_Environment[3]) *
be32_to_cpu(pb->pb_Environment[5]) *
blksize;
--
2.17.1
A crash was reported in amd-sfh related to hid core initialization
before SFH initialization has run.
```
amdtp_hid_request+0x36/0x50 [amd_sfh
2e3095779aada9fdb1764f08ca578ccb14e41fe4]
sensor_hub_get_feature+0xad/0x170 [hid_sensor_hub
d6157999c9d260a1bfa6f27d4a0dc2c3e2c5654e]
hid_sensor_parse_common_attributes+0x217/0x310 [hid_sensor_iio_common
07a7935272aa9c7a28193b574580b3e953a64ec4]
hid_gyro_3d_probe+0x7f/0x2e0 [hid_sensor_gyro_3d
9f2eb51294a1f0c0315b365f335617cbaef01eab]
platform_probe+0x44/0xa0
really_probe+0x19e/0x3e0
```
Ensure that sensors have been set up before calling into
amd_sfh_get_report() or amd_sfh_set_report().
Cc: stable(a)vger.kernel.org
Cc: Linux regression tracking (Thorsten Leemhuis) <regressions(a)leemhuis.info>
Fixes: 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors are enabled, clean up")
Reported-by: Haochen Tong <linux(a)hexchain.org>
Link: https://lore.kernel.org/all/3250319.ancTxkQ2z5@zen/T/
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/hid/amd-sfh-hid/amd_sfh_client.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
index d9b7b01900b5..88f3d913eaa1 100644
--- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
+++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
@@ -25,6 +25,9 @@ void amd_sfh_set_report(struct hid_device *hid, int report_id,
struct amdtp_cl_data *cli_data = hid_data->cli_data;
int i;
+ if (!cli_data->is_any_sensor_enabled)
+ return;
+
for (i = 0; i < cli_data->num_hid_devices; i++) {
if (cli_data->hid_sensor_hubs[i] == hid) {
cli_data->cur_hid_dev = i;
@@ -41,6 +44,9 @@ int amd_sfh_get_report(struct hid_device *hid, int report_id, int report_type)
struct request_list *req_list = &cli_data->req_list;
int i;
+ if (!cli_data->is_any_sensor_enabled)
+ return -ENODEV;
+
for (i = 0; i < cli_data->num_hid_devices; i++) {
if (cli_data->hid_sensor_hubs[i] == hid) {
struct request_list *new = kzalloc(sizeof(*new), GFP_KERNEL);
--
2.34.1
I was able to reproduce crash on 5.15.y kernel during COW, and
when the grandchild process attempts a write to a private page
inherited from the child process and the private page contains
a memory uncorrectable error. The way to reproduce is described
in Tony's patch, using his ras-tools/einj_mem_uc.
And the patch series fixed the panic issue in 5.15.y.
The backport has encountered trivial conflicts due to missing
dependencies, details are provided in each patch.
Please let me know whether the backport is acceptable.
Tony Luck (2):
mm, hwpoison: try to recover from copy-on write faults
mm, hwpoison: when copy-on-write hits poison, take page offline
include/linux/highmem.h | 24 ++++++++++++++++++++++++
include/linux/mm.h | 5 ++++-
mm/memory.c | 33 +++++++++++++++++++++++----------
3 files changed, 51 insertions(+), 11 deletions(-)
--
2.18.4
From: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com>
From: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com>
commit 5da7cb193db32da783a3f3e77d8b639989321d48 upstream.
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode. The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.
On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories. Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed. And
the kfree_rcu_monitor() function does in fact check for this.
Except that the kfree_rcu_monitor() function checks these pointers one
at a time. This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately. This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.
To see this, consider the following sequence of events, in which:
o Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
then is preempted.
o CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
after a later grace period. Except that "from_cset" is freed
right after the previous grace period ended, so that "from_cset"
is immediately freed. Task A resumes and references "from_cset"'s
member, after which nothing good happens.
In full detail:
CPU 0 CPU 1
---------------------- ----------------------
count_memcg_event_mm()
|rcu_read_lock() <---
|mem_cgroup_from_task()
|// css_set_ptr is the "from_cset" mentioned on CPU 1
|css_set_ptr = rcu_dereference((task)->cgroups)
|// Hard irq comes, current task is scheduled out.
cgroup_attach_task()
|cgroup_migrate()
|cgroup_migrate_execute()
|css_set_move_task(task, from_cset, to_cset, true)
|cgroup_move_task(task, to_cset)
|rcu_assign_pointer(.., to_cset)
|...
|cgroup_migrate_finish()
|put_css_set_locked(from_cset)
|from_cset->refcount return 0
|kfree_rcu(cset, rcu_head) // free from_cset after new gp
|add_ptr_to_bulk_krc_lock()
|schedule_delayed_work(&krcp->monitor_work, ..)
kfree_rcu_monitor()
|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
|queue_rcu_work(system_wq, &krwp->rcu_work)
|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp
// There is a perious call_rcu(.., rcu_work_rcufn)
// gp end, rcu_work_rcufn() is called.
rcu_work_rcufn()
|__queue_work(.., rwork->wq, &rwork->work);
|kfree_rcu_work()
|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
|The "from_cset" is freed before new gp end.
// the task resumes some time later.
|css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.
This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.
v2: Use helper function instead of inserted code block at kfree_rcu_monitor().
[UR: backport to 5.10-stable]
[UR: Added missing need_offload_krc() function]
Fixes: 34c881745549 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d620447 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai(a)unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Tested-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
---
Resending per Greg's request.
Original posting: https://lore.kernel.org/all/20230418102518.5911-1-urezki@gmail.com/
kernel/rcu/tree.c | 49 +++++++++++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 14 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 30e1d7fedb5f..eec8e2f7537e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3281,6 +3281,30 @@ static void kfree_rcu_work(struct work_struct *work)
}
}
+static bool
+need_offload_krc(struct kfree_rcu_cpu *krcp)
+{
+ int i;
+
+ for (i = 0; i < FREE_N_CHANNELS; i++)
+ if (krcp->bkvhead[i])
+ return true;
+
+ return !!krcp->head;
+}
+
+static bool
+need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp)
+{
+ int i;
+
+ for (i = 0; i < FREE_N_CHANNELS; i++)
+ if (krwp->bkvhead_free[i])
+ return true;
+
+ return !!krwp->head_free;
+}
+
/*
* Schedule the kfree batch RCU work to run in workqueue context after a GP.
*
@@ -3298,16 +3322,13 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
for (i = 0; i < KFREE_N_BATCHES; i++) {
krwp = &(krcp->krw_arr[i]);
- /*
- * Try to detach bkvhead or head and attach it over any
- * available corresponding free channel. It can be that
- * a previous RCU batch is in progress, it means that
- * immediately to queue another one is not possible so
- * return false to tell caller to retry.
- */
- if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
- (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
- (krcp->head && !krwp->head_free)) {
+ // Try to detach bulk_head or head and attach it, only when
+ // all channels are free. Any channel is not free means at krwp
+ // there is on-going rcu work to handle krwp's free business.
+ if (need_wait_for_krwp_work(krwp))
+ continue;
+
+ if (need_offload_krc(krcp)) {
// Channel 1 corresponds to SLAB ptrs.
// Channel 2 corresponds to vmalloc ptrs.
for (j = 0; j < FREE_N_CHANNELS; j++) {
@@ -3334,12 +3355,12 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
*/
queue_rcu_work(system_wq, &krwp->rcu_work);
}
-
- // Repeat if any "free" corresponding channel is still busy.
- if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
- repeat = true;
}
+ // Repeat if any "free" corresponding channel is still busy.
+ if (need_offload_krc(krcp))
+ repeat = true;
+
return !repeat;
}
--
2.41.0.162.gfafddb0af9-goog
This patch fixes a stable only patch, so it has no direct upstream
equivalent.
After a stable only patch to explicitly handle the '.got' section to
handle an orphan section warning from the linker, certain configurations
error when linking with ld.lld, which enables relro by default:
ld.lld: error: section: .got is not contiguous with other relro sections
This has come up with other architectures before, such as arm and arm64
in commit 0cda9bc15dfc ("ARM: 9038/1: Link with '-z norelro'") and
commit 3b92fa7485eb ("arm64: link with -z norelro regardless of
CONFIG_RELOCATABLE"). Additionally, '-z norelro' is used unconditionally
for RISC-V upstream after commit 26e7aacb83df ("riscv: Allow to
downgrade paging mode from the command line"), which alluded to this
issue for the same reason. Bring 6.3 in line with mainline and link with
'-z norelro', which resolves the above link failure.
Fixes: e6d1562dd4e9 ("riscv: vmlinux.lds.S: Explicitly handle '.got' section")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306192231.DJmWr6BX-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
arch/riscv/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index b05e833a022d..d46b6722710f 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -7,7 +7,7 @@
#
OBJCOPYFLAGS := -O binary
-LDFLAGS_vmlinux :=
+LDFLAGS_vmlinux := -z norelro
ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
LDFLAGS_vmlinux := --no-relax
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
---
base-commit: f2427f9a3730e9a1a11b69f6b767f7f2fad87523
change-id: 20230620-6-3-fix-got-relro-error-lld-397f3112860b
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
In jfs_dmap.c at line 381, BLKTODMAP is used to get a logical block
number inside dbFree(). db_l2nbperpage, which is the log2 number of
blocks per page, is passed as an argument to BLKTODMAP which uses it
for shifting.
Syzbot reported a shift out-of-bounds crash because db_l2nbperpage is
too big. This happens because the large value is set without any
validation in dbMount() at line 181.
Thus, make sure that db_l2nbperpage is correct while mounting.
Max number of blocks per page = Page size / Min block size
=> log2(Max num_block per page) = log2(Page size / Min block size)
= log2(Page size) - log2(Min block size)
=> Max db_l2nbperpage = L2PSIZE - L2MINBLOCKSIZE
Reported-and-tested-by: syzbot+d2cd27dcf8e04b232eb2(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=2a70a453331db32ed491f5cbb07e81bf2d2257…
Cc: stable(a)vger.kernel.org
Suggested-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Siddh Raman Pant <code(a)siddh.me>
---
Changes in v3:
- Fix typo in commit message (number of pages -> number of blocks per page).
Changes in v2:
- Fix upper bound as pointed out in v1 by Shaggy.
- Add an explanation for the same in commit message for completeness.
fs/jfs/jfs_dmap.c | 6 ++++++
fs/jfs/jfs_filsys.h | 2 ++
2 files changed, 8 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
index a3eb1e826947..da6a2bc6bf02 100644
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -178,7 +178,13 @@ int dbMount(struct inode *ipbmap)
dbmp_le = (struct dbmap_disk *) mp->data;
bmp->db_mapsize = le64_to_cpu(dbmp_le->dn_mapsize);
bmp->db_nfree = le64_to_cpu(dbmp_le->dn_nfree);
+
bmp->db_l2nbperpage = le32_to_cpu(dbmp_le->dn_l2nbperpage);
+ if (bmp->db_l2nbperpage > L2PSIZE - L2MINBLOCKSIZE) {
+ err = -EINVAL;
+ goto err_release_metapage;
+ }
+
bmp->db_numag = le32_to_cpu(dbmp_le->dn_numag);
if (!bmp->db_numag) {
err = -EINVAL;
diff --git a/fs/jfs/jfs_filsys.h b/fs/jfs/jfs_filsys.h
index b5d702df7111..33ef13a0b110 100644
--- a/fs/jfs/jfs_filsys.h
+++ b/fs/jfs/jfs_filsys.h
@@ -122,7 +122,9 @@
#define NUM_INODE_PER_IAG INOSPERIAG
#define MINBLOCKSIZE 512
+#define L2MINBLOCKSIZE 9
#define MAXBLOCKSIZE 4096
+#define L2MAXBLOCKSIZE 12
#define MAXFILESIZE ((s64)1 << 52)
#define JFS_LINK_MAX 0xffffffff
--
2.39.2
In jfs_dmap.c at line 381, BLKTODMAP is used to get a logical block
number inside dbFree(). db_l2nbperpage, which is the log2 number of
blocks per page, is passed as an argument to BLKTODMAP which uses it
for shifting.
Syzbot reported a shift out-of-bounds crash because db_l2nbperpage is
too big. This happens because the large value is set without any
validation in dbMount() at line 181.
Thus, make sure that db_l2nbperpage is correct while mounting.
Max number of pages = Page size / Min block size
=> log2(Max number of pages) = log2(Page size / Min block size)
= log2(Page size) - log2(Min block size)
=> Max db_l2nbperpage = L2PSIZE - L2MINBLOCKSIZE
Reported-and-tested-by: syzbot+d2cd27dcf8e04b232eb2(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=2a70a453331db32ed491f5cbb07e81bf2d2257…
Cc: stable(a)vger.kernel.org
Suggested-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Siddh Raman Pant <code(a)siddh.me>
---
Changes in v2:
- Fix upper bound as pointed out in v1 by Shaggy.
- Add an explanation for the same in commit message for completeness.
fs/jfs/jfs_dmap.c | 6 ++++++
fs/jfs/jfs_filsys.h | 2 ++
2 files changed, 8 insertions(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
index a3eb1e826947..da6a2bc6bf02 100644
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -178,7 +178,13 @@ int dbMount(struct inode *ipbmap)
dbmp_le = (struct dbmap_disk *) mp->data;
bmp->db_mapsize = le64_to_cpu(dbmp_le->dn_mapsize);
bmp->db_nfree = le64_to_cpu(dbmp_le->dn_nfree);
+
bmp->db_l2nbperpage = le32_to_cpu(dbmp_le->dn_l2nbperpage);
+ if (bmp->db_l2nbperpage > L2PSIZE - L2MINBLOCKSIZE) {
+ err = -EINVAL;
+ goto err_release_metapage;
+ }
+
bmp->db_numag = le32_to_cpu(dbmp_le->dn_numag);
if (!bmp->db_numag) {
err = -EINVAL;
diff --git a/fs/jfs/jfs_filsys.h b/fs/jfs/jfs_filsys.h
index b5d702df7111..33ef13a0b110 100644
--- a/fs/jfs/jfs_filsys.h
+++ b/fs/jfs/jfs_filsys.h
@@ -122,7 +122,9 @@
#define NUM_INODE_PER_IAG INOSPERIAG
#define MINBLOCKSIZE 512
+#define L2MINBLOCKSIZE 9
#define MAXBLOCKSIZE 4096
+#define L2MAXBLOCKSIZE 12
#define MAXFILESIZE ((s64)1 << 52)
#define JFS_LINK_MAX 0xffffffff
--
2.39.2
From: Jim Wylder <jwylder(a)google.com>
[ Upstream commit 3981514180c987a79ea98f0ae06a7cbf58a9ac0f ]
Currently, when regmap_raw_write() splits the data, it uses the
max_raw_write value defined for the bus. For any bus that includes
the target register address in the max_raw_write value, the chunked
transmission will always exceed the maximum transmission length.
To avoid this problem, subtract the length of the register and the
padding from the maximum transmission.
Signed-off-by: Jim Wylder <jwylder(a)google.com
Link: https://lore.kernel.org/r/20230517152444.3690870-2-jwylder@google.com
Signed-off-by: Mark Brown <broonie(a)kernel.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/base/regmap/regmap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c
index 7de1f27d0323d..8359164bff903 100644
--- a/drivers/base/regmap/regmap.c
+++ b/drivers/base/regmap/regmap.c
@@ -2064,6 +2064,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg,
size_t val_count = val_len / val_bytes;
size_t chunk_count, chunk_bytes;
size_t chunk_regs = val_count;
+ size_t max_data = map->max_raw_write - map->format.reg_bytes -
+ map->format.pad_bytes;
int ret, i;
if (!val_count)
@@ -2071,8 +2073,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg,
if (map->use_single_write)
chunk_regs = 1;
- else if (map->max_raw_write && val_len > map->max_raw_write)
- chunk_regs = map->max_raw_write / val_bytes;
+ else if (map->max_raw_write && val_len > max_data)
+ chunk_regs = max_data / val_bytes;
chunk_count = val_count / chunk_regs;
chunk_bytes = chunk_regs * val_bytes;
--
2.39.2
The current way how lowcomms is configured is due configfs entries. Each
comms configfs entry will create a lowcomms connection. Even the local
connection itself will be stored as a lowcomms connection, although most
functionality for a local lowcomms connection struct is not necessary.
Now in some scenarios we will see that dlm_controld reports a -EEXIST
when configure a node via configfs:
... /sys/kernel/config/dlm/cluster/comms/1/addr: write failed: 17 -1
Doing a:
cat /sys/kernel/config/dlm/cluster/comms/1/addr_list
reported nothing. This was being seen on cluster with nodeid 1 and it's
local configuration. To be sure the configfs entries are in sync with
lowcomms connection structures we always call dlm_midcomms_close() to be
sure the lowcomms connection gets removed when the configfs entry gets
dropped.
Before commit 07ee38674a0b ("fs: dlm: filter ourself midcomms calls") it
was just doing this by accident and the filter by doing:
if (nodeid == dlm_our_nodeid())
return 0;
inside dlm_midcomms_close() was never been hit because drop_comm() sets
local_comm to NULL and cause that dlm_our_nodeid() returns always the
invalid nodeid 0.
Fixes: 07ee38674a0b ("fs: dlm: filter ourself midcomms calls")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
---
changes since v2:
- add fixes tag
- cc stable
fs/dlm/config.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 4246cd425671..2beceff024e3 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -532,8 +532,7 @@ static void drop_comm(struct config_group *g, struct config_item *i)
struct dlm_comm *cm = config_item_to_comm(i);
if (local_comm == cm)
local_comm = NULL;
- if (!cm->local)
- dlm_midcomms_close(cm->nodeid);
+ dlm_midcomms_close(cm->nodeid);
while (cm->addr_count--)
kfree(cm->addr[cm->addr_count]);
config_item_put(i);
--
2.31.1
Don't assume that the device is fully under the control of PCI core.
Use RMW capability accessors in link retraining which do proper locking
to avoid losing concurrent updates to the register values.
Fixes: 4ec73791a64b ("PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
pci/enumeration branch moves the link retraining code into PCI core and
also conflicts with a link retraining fix in pci/aspm. The changelog
(and patch splitting) takes the move into account by not referring to
ASPM while the change itself is not based on pci/enumeration (as per
Bjorn's preference).
---
drivers/pci/pcie/aspm.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 66d7514ca111..50e32bda4656 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -199,17 +199,14 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
unsigned long end_jiffies;
u16 reg16;
- pcie_capability_read_word(parent, PCI_EXP_LNKCTL, ®16);
- reg16 |= PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
+ pcie_capability_set_word(parent, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
if (parent->clear_retrain_link) {
/*
* Due to an erratum in some devices the Retrain Link bit
* needs to be cleared again manually to allow the link
* training to succeed.
*/
- reg16 &= ~PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
+ pcie_capability_clear_word(parent, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
}
/* Wait for link training end. Break out after waiting for timeout */
--
2.30.2
Many places in the kernel write the Link Control and Root Control PCI
Express Capability Registers without proper concurrency control and
this could result in losing the changes one of the writers intended to
make.
Add pcie_cap_lock spinlock into the struct pci_dev and use it to
protect bit changes made in the RMW capability accessors. Protect only
a selected set of registers by differentiating the RMW accessor
internally to locked/unlocked variants using a wrapper which has the
same signature as pcie_capability_clear_and_set_word(). As the
Capability Register (pos) given to the wrapper is always a constant,
the compiler should be able to simplify all the dead-code away.
The RMW locking is only added to pcie_capability_clear_and_set_word()
because so far only the Link Control Register (ASPM, hotplug, link
retraining, various drivers) and the Root Control Register (AER & PME)
require RMW locking.
Fixes: c7f486567c1d ("PCI PM: PCIe PME root port service driver")
Fixes: f12eb72a268b ("PCI/ASPM: Use PCI Express Capability accessors")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Fixes: affa48de8417 ("staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM")
Fixes: 849a9366cba9 ("misc: rtsx: Add support new chip rts5228 mmc: rtsx: Add support MMC_CAP2_NO_MMC")
Fixes: 3d1e7aa80d1c ("misc: rtsx: Use pcie_capability_clear_and_set_word() for PCI_EXP_LNKCTL")
Fixes: c0e5f4e73a71 ("misc: rtsx: Add support for RTS5261")
Fixes: 3df4fce739e2 ("misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG")
Fixes: 121e9c6b5c4c ("misc: rtsx: modify and fix init_hw function")
Fixes: 19f3bd548f27 ("mfd: rtsx: Remove LCTLR defination")
Fixes: 773ccdfd9cc6 ("mfd: rtsx: Read vendor setting from config space")
Fixes: 8275b77a1513 ("mfd: rts5249: Add support for RTS5250S power saving")
Fixes: 5da4e04ae480 ("misc: rtsx: Add support for RTS5260")
Fixes: 0f49bfbd0f2e ("tg3: Use PCI Express Capability accessors")
Fixes: 5e7dfd0fb94a ("tg3: Prevent corruption at 10 / 100Mbps w CLKREQ")
Fixes: b726e493e8dc ("r8169: sync existing 8168 device hardware start sequences with vendor driver")
Fixes: e6de30d63eb1 ("r8169: more 8168dp support.")
Fixes: 8a06127602de ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards")
Fixes: 6f461f6c7c96 ("e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata")
Fixes: 1eae4eb2a1c7 ("e1000e: Disable L1 ASPM power savings for 82573 mobile variants")
Fixes: 8060e169e02f ("ath9k: Enable extended synch for AR9485 to fix L0s recovery issue")
Fixes: 69ce674bfa69 ("ath9k: do btcoex ASPM disabling at initialization time")
Fixes: f37f05503575 ("mt76: mt76x2e: disable pcie_aspm by default")
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/pci/access.c | 20 +++++++++++++++++---
drivers/pci/probe.c | 1 +
include/linux/pci.h | 34 ++++++++++++++++++++++++++++++++--
3 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 3c230ca3de58..0b2e90d2f04f 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -497,8 +497,8 @@ int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val)
}
EXPORT_SYMBOL(pcie_capability_write_dword);
-int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
- u16 clear, u16 set)
+int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set)
{
int ret;
u16 val;
@@ -512,7 +512,21 @@ int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
return ret;
}
-EXPORT_SYMBOL(pcie_capability_clear_and_set_word);
+EXPORT_SYMBOL(pcie_capability_clear_and_set_word_unlocked);
+
+int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&dev->pcie_cap_lock, flags);
+ ret = pcie_capability_clear_and_set_word_unlocked(dev, pos, clear, set);
+ spin_unlock_irqrestore(&dev->pcie_cap_lock, flags);
+
+ return ret;
+}
+EXPORT_SYMBOL(pcie_capability_clear_and_set_word_locked);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0b2826c4a832..53ac0d3287a8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2318,6 +2318,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
.end = -1,
};
+ spin_lock_init(&dev->pcie_cap_lock);
#ifdef CONFIG_PCI_MSI
raw_spin_lock_init(&dev->msi_lock);
#endif
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60b8772b5bd4..ab7682ed172f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -467,6 +467,7 @@ struct pci_dev {
pci_dev_flags_t dev_flags;
atomic_t enable_cnt; /* pci_enable_device has been called */
+ spinlock_t pcie_cap_lock; /* Protects RMW ops in capability accessors */
u32 saved_config_space[16]; /* Config space saved at suspend time */
struct hlist_head saved_cap_space;
int rom_attr_enabled; /* Display of ROM attribute enabled? */
@@ -1217,11 +1218,40 @@ int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val);
int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val);
int pcie_capability_write_word(struct pci_dev *dev, int pos, u16 val);
int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val);
-int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
- u16 clear, u16 set);
+int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set);
+int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set);
+/**
+ * pcie_capability_clear_and_set_word - RMW accessor for PCI Express Capability Registers
+ * @dev: PCI device structure of the PCI Express device
+ * @pos: PCI Express Capability Register
+ * @clear: Clear bitmask
+ * @set: Set bitmask
+ *
+ * Perform a Read-Modify-Write (RMW) operation using @clear and @set
+ * bitmasks on PCI Express Capability Register at @pos. Certain PCI Express
+ * Capability Registers are accessed concurrently in RMW fashion, hence
+ * require locking which is handled transparently to the caller.
+ */
+static inline int pcie_capability_clear_and_set_word(struct pci_dev *dev,
+ int pos,
+ u16 clear, u16 set)
+{
+ switch (pos) {
+ case PCI_EXP_LNKCTL:
+ case PCI_EXP_RTCTL:
+ return pcie_capability_clear_and_set_word_locked(dev, pos,
+ clear, set);
+ default:
+ return pcie_capability_clear_and_set_word_unlocked(dev, pos,
+ clear, set);
+ }
+}
+
static inline int pcie_capability_set_word(struct pci_dev *dev, int pos,
u16 set)
{
--
2.30.2
Currently, associating a loop device with a different file descriptor
does not increment its diskseq. This allows the following race
condition:
1. Program X opens a loop device
2. Program X gets the diskseq of the loop device.
3. Program X associates a file with the loop device.
4. Program X passes the loop device major, minor, and diskseq to
something.
5. Program X exits.
6. Program Y detaches the file from the loop device.
7. Program Y attaches a different file to the loop device.
8. The opener finally gets around to opening the loop device and checks
that the diskseq is what it expects it to be. Even though the
diskseq is the expected value, the result is that the opener is
accessing the wrong file.
From discussions with Christoph Hellwig, it appears that
disk_force_media_change() was supposed to call inc_diskseq(), but in
fact it does not. Adding a Fixes: tag to indicate this. Christoph's
Reported-by is because he stated that disk_force_media_change()
calls inc_diskseq(), which is what led me to discover that it should but
does not.
Reported-by: Christoph Hellwig <hch(a)infradead.org>
Signed-off-by: Demi Marie Obenour <demi(a)invisiblethingslab.com>
Fixes: e6138dc12de9 ("block: add a helper to raise a media changed event")
Cc: stable(a)vger.kernel.org # 5.15+
---
block/disk-events.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/disk-events.c b/block/disk-events.c
index aee25a7e1ab7de8cc82b3c3774e83489d3a86ff9..450c2cbe23d56cc0fa8fa40db9866cdae0e7a626 100644
--- a/block/disk-events.c
+++ b/block/disk-events.c
@@ -307,6 +307,7 @@ bool disk_force_media_change(struct gendisk *disk, unsigned int events)
if (!(events & DISK_EVENT_MEDIA_CHANGE))
return false;
+ inc_diskseq(disk);
if (__invalidate_device(disk->part0, true))
pr_warn("VFS: busy inodes on changed media %s\n",
disk->disk_name);
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
The following commit has been merged into the x86/core branch of tip:
Commit-ID: 9b040453d4440659f33dc6f0aa26af418ebfe70b
Gitweb: https://git.kernel.org/tip/9b040453d4440659f33dc6f0aa26af418ebfe70b
Author: Tony Battersby <tonyb(a)cybernetics.com>
AuthorDate: Thu, 15 Jun 2023 22:33:52 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 20 Jun 2023 14:51:46 +02:00
x86/smp: Dont access non-existing CPUID leaf
stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel
CPUs return the content of the highest supported leaf when a non-existing
leaf is read, while AMD CPUs return all zeros for unsupported leafs.
So the result of the test on Intel CPUs is lottery.
While harmless it's incorrect and causes the conditional wbinvd() to be
issued where not required.
Check whether the leaf is supported before reading it.
[ tglx: Adjusted changelog ]
Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
Signed-off-by: Tony Battersby <tonyb(a)cybernetics.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.…
Link: https://lore.kernel.org/r/20230615193330.322186388@linutronix.de
---
arch/x86/kernel/process.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 05924bc..ff9b80a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -763,6 +763,7 @@ struct cpumask cpus_stop_mask;
void __noreturn stop_this_cpu(void *dummy)
{
+ struct cpuinfo_x86 *c = this_cpu_ptr(&cpu_info);
unsigned int cpu = smp_processor_id();
local_irq_disable();
@@ -777,7 +778,7 @@ void __noreturn stop_this_cpu(void *dummy)
*/
set_cpu_online(cpu, false);
disable_local_APIC();
- mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
+ mcheck_cpu_clear(c);
/*
* Use wbinvd on processors that support SME. This provides support
@@ -791,7 +792,7 @@ void __noreturn stop_this_cpu(void *dummy)
* Test the CPUID bit directly because the machine might've cleared
* X86_FEATURE_SME due to cmdline options.
*/
- if (cpuid_eax(0x8000001f) & BIT(0))
+ if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
native_wbinvd();
/*
The following commit has been merged into the x86/core branch of tip:
Commit-ID: 1f5e7eb7868e42227ac426c96d437117e6e06e8e
Gitweb: https://git.kernel.org/tip/1f5e7eb7868e42227ac426c96d437117e6e06e8e
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Wed, 26 Apr 2023 18:37:00 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 20 Jun 2023 14:51:46 +02:00
x86/smp: Make stop_other_cpus() more robust
Tony reported intermittent lockups on poweroff. His analysis identified the
wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that
on SME enabled machines a kexec() does not leave any stale data in the
caches when switching from encrypted to non-encrypted mode or vice versa.
That wbinvd() is conditional on the SME feature bit which is read directly
from CPUID. But that readout does not check whether the CPUID leaf is
available or not. If it's not available the CPU will return the value of
the highest supported leaf instead. Depending on the content the "SME" bit
might be set or not.
That's incorrect but harmless. Making the CPUID readout conditional makes
the observed hangs go away, but it does not fix the underlying problem:
CPU0 CPU1
stop_other_cpus()
send_IPIs(REBOOT); stop_this_cpu()
while (num_online_cpus() > 1); set_online(false);
proceed... -> hang
wbinvd()
WBINVD is an expensive operation and if multiple CPUs issue it at the same
time the resulting delays are even larger.
But CPU0 already observed num_online_cpus() going down to 1 and proceeds
which causes the system to hang.
This issue exists independent of WBINVD, but the delays caused by WBINVD
make it more prominent.
Make this more robust by adding a cpumask which is initialized to the
online CPU mask before sending the IPIs and CPUs clear their bit in
stop_this_cpu() after the WBINVD completed. Check for that cpumask to
become empty in stop_other_cpus() instead of watching num_online_cpus().
The cpumask cannot plug all holes either, but it's better than a raw
counter and allows to restrict the NMI fallback IPI to be sent only the
CPUs which have not reported within the timeout window.
Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
Reported-by: Tony Battersby <tonyb(a)cybernetics.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Ashok Raj <ashok.raj(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetic…
Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
---
arch/x86/include/asm/cpu.h | 2 +-
arch/x86/kernel/process.c | 23 ++++++++++++--
arch/x86/kernel/smp.c | 62 ++++++++++++++++++++++++-------------
3 files changed, 64 insertions(+), 23 deletions(-)
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 78796b9..9ba3c3d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -98,4 +98,6 @@ extern u64 x86_read_arch_cap_msr(void);
int intel_find_matching_signature(void *mc, unsigned int csig, int cpf);
int intel_microcode_sanity_check(void *mc, bool print_err, int hdr_type);
+extern struct cpumask cpus_stop_mask;
+
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index dac41a0..05924bc 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -759,13 +759,23 @@ bool xen_set_default_idle(void)
}
#endif
+struct cpumask cpus_stop_mask;
+
void __noreturn stop_this_cpu(void *dummy)
{
+ unsigned int cpu = smp_processor_id();
+
local_irq_disable();
+
/*
- * Remove this CPU:
+ * Remove this CPU from the online mask and disable it
+ * unconditionally. This might be redundant in case that the reboot
+ * vector was handled late and stop_other_cpus() sent an NMI.
+ *
+ * According to SDM and APM NMIs can be accepted even after soft
+ * disabling the local APIC.
*/
- set_cpu_online(smp_processor_id(), false);
+ set_cpu_online(cpu, false);
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
@@ -783,6 +793,15 @@ void __noreturn stop_this_cpu(void *dummy)
*/
if (cpuid_eax(0x8000001f) & BIT(0))
native_wbinvd();
+
+ /*
+ * This brings a cache line back and dirties it, but
+ * native_stop_other_cpus() will overwrite cpus_stop_mask after it
+ * observed that all CPUs reported stop. This write will invalidate
+ * the related cache line on this CPU.
+ */
+ cpumask_clear_cpu(cpu, &cpus_stop_mask);
+
for (;;) {
/*
* Use native_halt() so that memory contents don't change
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 375b33e..935bc65 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -27,6 +27,7 @@
#include <asm/mmu_context.h>
#include <asm/proto.h>
#include <asm/apic.h>
+#include <asm/cpu.h>
#include <asm/idtentry.h>
#include <asm/nmi.h>
#include <asm/mce.h>
@@ -146,31 +147,43 @@ static int register_stop_handler(void)
static void native_stop_other_cpus(int wait)
{
- unsigned long flags;
- unsigned long timeout;
+ unsigned int cpu = smp_processor_id();
+ unsigned long flags, timeout;
if (reboot_force)
return;
- /*
- * Use an own vector here because smp_call_function
- * does lots of things not suitable in a panic situation.
- */
+ /* Only proceed if this is the first CPU to reach this code */
+ if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1)
+ return;
/*
- * We start by using the REBOOT_VECTOR irq.
- * The irq is treated as a sync point to allow critical
- * regions of code on other cpus to release their spin locks
- * and re-enable irqs. Jumping straight to an NMI might
- * accidentally cause deadlocks with further shutdown/panic
- * code. By syncing, we give the cpus up to one second to
- * finish their work before we force them off with the NMI.
+ * 1) Send an IPI on the reboot vector to all other CPUs.
+ *
+ * The other CPUs should react on it after leaving critical
+ * sections and re-enabling interrupts. They might still hold
+ * locks, but there is nothing which can be done about that.
+ *
+ * 2) Wait for all other CPUs to report that they reached the
+ * HLT loop in stop_this_cpu()
+ *
+ * 3) If #2 timed out send an NMI to the CPUs which did not
+ * yet report
+ *
+ * 4) Wait for all other CPUs to report that they reached the
+ * HLT loop in stop_this_cpu()
+ *
+ * #3 can obviously race against a CPU reaching the HLT loop late.
+ * That CPU will have reported already and the "have all CPUs
+ * reached HLT" condition will be true despite the fact that the
+ * other CPU is still handling the NMI. Again, there is no
+ * protection against that as "disabled" APICs still respond to
+ * NMIs.
*/
- if (num_online_cpus() > 1) {
- /* did someone beat us here? */
- if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1)
- return;
+ cpumask_copy(&cpus_stop_mask, cpu_online_mask);
+ cpumask_clear_cpu(cpu, &cpus_stop_mask);
+ if (!cpumask_empty(&cpus_stop_mask)) {
/* sync above data before sending IRQ */
wmb();
@@ -183,12 +196,12 @@ static void native_stop_other_cpus(int wait)
* CPUs reach shutdown state.
*/
timeout = USEC_PER_SEC;
- while (num_online_cpus() > 1 && timeout--)
+ while (!cpumask_empty(&cpus_stop_mask) && timeout--)
udelay(1);
}
/* if the REBOOT_VECTOR didn't work, try with the NMI */
- if (num_online_cpus() > 1) {
+ if (!cpumask_empty(&cpus_stop_mask)) {
/*
* If NMI IPI is enabled, try to register the stop handler
* and send the IPI. In any case try to wait for the other
@@ -200,7 +213,8 @@ static void native_stop_other_cpus(int wait)
pr_emerg("Shutting down cpus with NMI\n");
- apic_send_IPI_allbutself(NMI_VECTOR);
+ for_each_cpu(cpu, &cpus_stop_mask)
+ apic->send_IPI(cpu, NMI_VECTOR);
}
/*
* Don't wait longer than 10 ms if the caller didn't
@@ -208,7 +222,7 @@ static void native_stop_other_cpus(int wait)
* one or more CPUs do not reach shutdown state.
*/
timeout = USEC_PER_MSEC * 10;
- while (num_online_cpus() > 1 && (wait || timeout--))
+ while (!cpumask_empty(&cpus_stop_mask) && (wait || timeout--))
udelay(1);
}
@@ -216,6 +230,12 @@ static void native_stop_other_cpus(int wait)
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
local_irq_restore(flags);
+
+ /*
+ * Ensure that the cpus_stop_mask cache lines are invalidated on
+ * the other CPUs. See comment vs. SME in stop_this_cpu().
+ */
+ cpumask_clear(&cpus_stop_mask);
}
/*
The following commit has been merged into the x86/core branch of tip:
Commit-ID: 2affa6d6db28855e6340b060b809c23477aa546e
Gitweb: https://git.kernel.org/tip/2affa6d6db28855e6340b060b809c23477aa546e
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Thu, 15 Jun 2023 22:33:54 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 20 Jun 2023 14:51:46 +02:00
x86/smp: Remove pointless wmb()s from native_stop_other_cpus()
The wmb()s before sending the IPIs are not synchronizing anything.
If at all then the apic IPI functions have to provide or act as appropriate
barriers.
Remove these cargo cult barriers which have no explanation of what they are
synchronizing.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.378358382@linutronix.de
---
arch/x86/kernel/smp.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 935bc65..d842875 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -184,9 +184,6 @@ static void native_stop_other_cpus(int wait)
cpumask_clear_cpu(cpu, &cpus_stop_mask);
if (!cpumask_empty(&cpus_stop_mask)) {
- /* sync above data before sending IRQ */
- wmb();
-
apic_send_IPI_allbutself(REBOOT_VECTOR);
/*
@@ -208,9 +205,6 @@ static void native_stop_other_cpus(int wait)
* CPUs to stop.
*/
if (!smp_no_nmi_ipi && !register_stop_handler()) {
- /* Sync above data before sending IRQ */
- wmb();
-
pr_emerg("Shutting down cpus with NMI\n");
for_each_cpu(cpu, &cpus_stop_mask)
The following commit has been merged into the x86/core branch of tip:
Commit-ID: f9c9987bf52f4e42e940ae217333ebb5a4c3b506
Gitweb: https://git.kernel.org/tip/f9c9987bf52f4e42e940ae217333ebb5a4c3b506
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Thu, 15 Jun 2023 22:33:55 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 20 Jun 2023 14:51:47 +02:00
x86/smp: Use dedicated cache-line for mwait_play_dead()
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an
obvious choice as all what is needed is a cache line which is not written
by other CPUs.
But there is a use case where a "dead" CPU needs to be brought out of
MWAIT: kexec().
This is required as kexec() can overwrite text, pagetables, stacks and the
monitored cacheline of the original kernel. The latter causes MWAIT to
resume execution which obviously causes havoc on the kexec kernel which
results usually in triple faults.
Use a dedicated per CPU storage to prepare for that.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj(a)intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de
---
arch/x86/kernel/smpboot.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce..c5ac5d7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -101,6 +101,17 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map);
DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
EXPORT_PER_CPU_SYMBOL(cpu_info);
+struct mwait_cpu_dead {
+ unsigned int control;
+ unsigned int status;
+};
+
+/*
+ * Cache line aligned data for mwait_play_dead(). Separate on purpose so
+ * that it's unlikely to be touched by other CPUs.
+ */
+static DEFINE_PER_CPU_ALIGNED(struct mwait_cpu_dead, mwait_cpu_dead);
+
/* Logical package management. We might want to allocate that dynamically */
unsigned int __max_logical_packages __read_mostly;
EXPORT_SYMBOL(__max_logical_packages);
@@ -1758,10 +1769,10 @@ EXPORT_SYMBOL_GPL(cond_wakeup_cpu0);
*/
static inline void mwait_play_dead(void)
{
+ struct mwait_cpu_dead *md = this_cpu_ptr(&mwait_cpu_dead);
unsigned int eax, ebx, ecx, edx;
unsigned int highest_cstate = 0;
unsigned int highest_subcstate = 0;
- void *mwait_ptr;
int i;
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
@@ -1796,13 +1807,6 @@ static inline void mwait_play_dead(void)
(highest_subcstate - 1);
}
- /*
- * This should be a memory location in a cache line which is
- * unlikely to be touched by other processors. The actual
- * content is immaterial as it is not actually modified in any way.
- */
- mwait_ptr = ¤t_thread_info()->flags;
-
wbinvd();
while (1) {
@@ -1814,9 +1818,9 @@ static inline void mwait_play_dead(void)
* case where we return around the loop.
*/
mb();
- clflush(mwait_ptr);
+ clflush(md);
mb();
- __monitor(mwait_ptr, 0, 0);
+ __monitor(md, 0, 0);
mb();
__mwait(eax, 0);
The following commit has been merged into the x86/core branch of tip:
Commit-ID: d7893093a7417527c0d73c9832244e65c9d0114f
Gitweb: https://git.kernel.org/tip/d7893093a7417527c0d73c9832244e65c9d0114f
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Thu, 15 Jun 2023 22:33:57 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 20 Jun 2023 14:51:47 +02:00
x86/smp: Cure kexec() vs. mwait_play_dead() breakage
TLDR: It's a mess.
When kexec() is executed on a system with offline CPUs, which are parked in
mwait_play_dead() it can end up in a triple fault during the bootup of the
kexec kernel or cause hard to diagnose data corruption.
The reason is that kexec() eventually overwrites the previous kernel's text,
page tables, data and stack. If it writes to the cache line which is
monitored by a previously offlined CPU, MWAIT resumes execution and ends
up executing the wrong text, dereferencing overwritten page tables or
corrupting the kexec kernels data.
Cure this by bringing the offlined CPUs out of MWAIT into HLT.
Write to the monitored cache line of each offline CPU, which makes MWAIT
resume execution. The written control word tells the offlined CPUs to issue
HLT, which does not have the MWAIT problem.
That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as
those make it come out of HLT.
A follow up change will put them into INIT, which protects at least against
NMI and SMI.
Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case")
Reported-by: Ashok Raj <ashok.raj(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Ashok Raj <ashok.raj(a)intel.com>
Reviewed-by: Ashok Raj <ashok.raj(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
---
arch/x86/include/asm/smp.h | 2 +-
arch/x86/kernel/smp.c | 5 +++-
arch/x86/kernel/smpboot.c | 59 +++++++++++++++++++++++++++++++++++++-
3 files changed, 66 insertions(+)
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 4e91054..d4ce5cb 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -132,6 +132,8 @@ void wbinvd_on_cpu(int cpu);
int wbinvd_on_all_cpus(void);
void cond_wakeup_cpu0(void);
+void smp_kick_mwait_play_dead(void);
+
void native_smp_send_reschedule(int cpu);
void native_send_call_func_ipi(const struct cpumask *mask);
void native_send_call_func_single_ipi(int cpu);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index d842875..174d623 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -21,6 +21,7 @@
#include <linux/interrupt.h>
#include <linux/cpu.h>
#include <linux/gfp.h>
+#include <linux/kexec.h>
#include <asm/mtrr.h>
#include <asm/tlbflush.h>
@@ -157,6 +158,10 @@ static void native_stop_other_cpus(int wait)
if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1)
return;
+ /* For kexec, ensure that offline CPUs are out of MWAIT and in HLT */
+ if (kexec_in_progress)
+ smp_kick_mwait_play_dead();
+
/*
* 1) Send an IPI on the reboot vector to all other CPUs.
*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c5ac5d7..483df04 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -53,6 +53,7 @@
#include <linux/tboot.h>
#include <linux/gfp.h>
#include <linux/cpuidle.h>
+#include <linux/kexec.h>
#include <linux/numa.h>
#include <linux/pgtable.h>
#include <linux/overflow.h>
@@ -106,6 +107,9 @@ struct mwait_cpu_dead {
unsigned int status;
};
+#define CPUDEAD_MWAIT_WAIT 0xDEADBEEF
+#define CPUDEAD_MWAIT_KEXEC_HLT 0x4A17DEAD
+
/*
* Cache line aligned data for mwait_play_dead(). Separate on purpose so
* that it's unlikely to be touched by other CPUs.
@@ -173,6 +177,10 @@ static void smp_callin(void)
{
int cpuid;
+ /* Mop up eventual mwait_play_dead() wreckage */
+ this_cpu_write(mwait_cpu_dead.status, 0);
+ this_cpu_write(mwait_cpu_dead.control, 0);
+
/*
* If waken up by an INIT in an 82489DX configuration
* cpu_callout_mask guarantees we don't get here before
@@ -1807,6 +1815,10 @@ static inline void mwait_play_dead(void)
(highest_subcstate - 1);
}
+ /* Set up state for the kexec() hack below */
+ md->status = CPUDEAD_MWAIT_WAIT;
+ md->control = CPUDEAD_MWAIT_WAIT;
+
wbinvd();
while (1) {
@@ -1824,10 +1836,57 @@ static inline void mwait_play_dead(void)
mb();
__mwait(eax, 0);
+ if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) {
+ /*
+ * Kexec is about to happen. Don't go back into mwait() as
+ * the kexec kernel might overwrite text and data including
+ * page tables and stack. So mwait() would resume when the
+ * monitor cache line is written to and then the CPU goes
+ * south due to overwritten text, page tables and stack.
+ *
+ * Note: This does _NOT_ protect against a stray MCE, NMI,
+ * SMI. They will resume execution at the instruction
+ * following the HLT instruction and run into the problem
+ * which this is trying to prevent.
+ */
+ WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT);
+ while(1)
+ native_halt();
+ }
+
cond_wakeup_cpu0();
}
}
+/*
+ * Kick all "offline" CPUs out of mwait on kexec(). See comment in
+ * mwait_play_dead().
+ */
+void smp_kick_mwait_play_dead(void)
+{
+ u32 newstate = CPUDEAD_MWAIT_KEXEC_HLT;
+ struct mwait_cpu_dead *md;
+ unsigned int cpu, i;
+
+ for_each_cpu_andnot(cpu, cpu_present_mask, cpu_online_mask) {
+ md = per_cpu_ptr(&mwait_cpu_dead, cpu);
+
+ /* Does it sit in mwait_play_dead() ? */
+ if (READ_ONCE(md->status) != CPUDEAD_MWAIT_WAIT)
+ continue;
+
+ /* Wait up to 5ms */
+ for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) {
+ /* Bring it out of mwait */
+ WRITE_ONCE(md->control, newstate);
+ udelay(5);
+ }
+
+ if (READ_ONCE(md->status) != newstate)
+ pr_err_once("CPU%u is stuck in mwait_play_dead()\n", cpu);
+ }
+}
+
void __noreturn hlt_play_dead(void)
{
if (__this_cpu_read(cpu_info.x86) >= 4)
The errata sheets for both ksz9477 and ksz9567 begin with
IMPORTANT NOTE
Multiple errata workarounds in this document call for changing PHY
registers for each PHY port. PHY registers 0x0 to 0x1F are in the
address range 0xN100 to 0xN13F, while indirect (MMD) PHY registers
are accessed via the PHY MMD Setup Register and the PHY MMD Data
Register.
Before configuring the PHY MMD registers, it is necessary to set the
PHY to 100 Mbps speed with auto-negotiation disabled by writing to
register 0xN100-0xN101. After writing the MMD registers, and after
all errata workarounds that involve PHY register settings, write
register 0xN100-0xN101 again to enable and restart auto-negotiation.
Without that explicit auto-neg restart, we do sometimes have problems
establishing link.
Rather than writing back the hardcoded 0x1340 value the errata sheet
suggests (which likely just corresponds to the most common strap
configuration), restore the original value, setting the
PORT_AUTO_NEG_RESTART bit if PORT_AUTO_NEG_ENABLE is set.
Fixes: 1fc33199185d ("net: dsa: microchip: Add PHY errata workarounds")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
---
While I do believe this is a fix, I don't think it's post-rc7
material, hence targeting net-next with cc stable.
drivers/net/dsa/microchip/ksz9477.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index bf13d47c26cf..9a712ea71ee7 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -902,6 +902,16 @@ static void ksz9477_port_mmd_write(struct ksz_device *dev, int port,
static void ksz9477_phy_errata_setup(struct ksz_device *dev, int port)
{
+ u16 cr;
+
+ /* Errata document says the PHY must be configured to 100Mbps
+ * with auto-neg disabled before configuring the PHY MMD
+ * registers.
+ */
+ ksz_pread16(dev, port, REG_PORT_PHY_CTRL, &cr);
+ ksz_pwrite16(dev, port, REG_PORT_PHY_CTRL,
+ PORT_SPEED_100MBIT | PORT_FULL_DUPLEX);
+
/* Apply PHY settings to address errata listed in
* KSZ9477, KSZ9897, KSZ9896, KSZ9567, KSZ8565
* Silicon Errata and Data Sheet Clarification documents:
@@ -943,6 +953,13 @@ static void ksz9477_phy_errata_setup(struct ksz_device *dev, int port)
ksz9477_port_mmd_write(dev, port, 0x1c, 0x1d, 0xe7ff);
ksz9477_port_mmd_write(dev, port, 0x1c, 0x1e, 0xefff);
ksz9477_port_mmd_write(dev, port, 0x1c, 0x20, 0xeeee);
+
+ /* Restore PHY CTRL register, restart auto-negotiation if
+ * enabled in the original value.
+ */
+ if (cr & PORT_AUTO_NEG_ENABLE)
+ cr |= PORT_AUTO_NEG_RESTART;
+ ksz_pwrite16(dev, port, REG_PORT_PHY_CTRL, cr);
}
void ksz9477_get_caps(struct ksz_device *dev, int port,
--
2.37.2
On Sun 2023-06-18 07:43:10, Manuel Lauss wrote:
> On Fri, Jun 16, 2023 at 9:33 PM Pavel Machek <pavel(a)denx.de> wrote:
>
> > Hi!
> >
> > > From: Manuel Lauss <manuel.lauss(a)gmail.com>
> > >
> > > [ Upstream commit 2d645604f69f3a772d58ead702f9a8e84ab2b342 ]
> > >
> > > Various fixes for the Au1200/Au1550/Au1300 DBDMA2 code:
> > >
> > > - skip cache invalidation if chip has working coherency circuitry.
> > > - invalidate KSEG0-portion of the (physical) data address.
> > > - force the dma channel doorbell write out to bus immediately with
> > > a sync.
> > >
> > > Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
> > > Signed-off-by: Sasha Levin <sashal(a)kernel.org>
> >
> > I believe author's signoff is missing here.
> >
>
> As the author, I say this patch should not be applied to 4.xx at all. Same
> for my other 2 MIPS patches.
Thanks for info, where is the threshold, do we need them for 5.10?
Sasha, please drop.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
commit e18eb8783ec4949adebc7d7b0fdb65f65bfeefd9 upstream.
Currently the tracing_reset_all_online_cpus() requires the
trace_types_lock held. But only one caller of this function actually has
that lock held before calling it, and the other just takes the lock so
that it can call it. More users of this function is needed where the lock
is not held.
Add a tracing_reset_all_online_cpus_unlocked() function for the one use
case that calls it without being held, and also add a lockdep_assert to
make sure it is held when called.
Then have tracing_reset_all_online_cpus() take the lock internally, such
that callers do not need to worry about taking it.
Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
[Refers to commit message of 1603feac154ff38514e8354e3079a455eb4801e2,
this patch is pre-depended, and tracing_reset_all_online_cpus() should
be called after trace_types_lock is held as its comment describes.]
Fixes: 1603feac154f ("tracing: Free buffers when a used dynamic event is removed")
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
---
kernel/trace/trace.c | 11 ++++++++++-
kernel/trace/trace.h | 1 +
kernel/trace/trace_events.c | 2 +-
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d068124815bc..219cd2c81936 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1931,10 +1931,12 @@ void tracing_reset_online_cpus(struct trace_buffer *buf)
}
/* Must have trace_types_lock held */
-void tracing_reset_all_online_cpus(void)
+void tracing_reset_all_online_cpus_unlocked(void)
{
struct trace_array *tr;
+ lockdep_assert_held(&trace_types_lock);
+
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
if (!tr->clear_trace)
continue;
@@ -1946,6 +1948,13 @@ void tracing_reset_all_online_cpus(void)
}
}
+void tracing_reset_all_online_cpus(void)
+{
+ mutex_lock(&trace_types_lock);
+ tracing_reset_all_online_cpus_unlocked();
+ mutex_unlock(&trace_types_lock);
+}
+
/*
* The tgid_map array maps from pid to tgid; i.e. the value stored at index i
* is the tgid last observed corresponding to pid=i.
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f2ff39353e03..edc17a640ab3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -677,6 +677,7 @@ int tracing_is_enabled(void);
void tracing_reset_online_cpus(struct trace_buffer *buf);
void tracing_reset_current(int cpu);
void tracing_reset_all_online_cpus(void);
+void tracing_reset_all_online_cpus_unlocked(void);
int tracing_open_generic(struct inode *inode, struct file *filp);
int tracing_open_generic_tr(struct inode *inode, struct file *filp);
bool tracing_is_disabled(void);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 8f2cbc9ebb6e..a0675ecc8142 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2440,7 +2440,7 @@ static void trace_module_remove_events(struct module *mod)
* over from this module may be passed to the new module events and
* unexpected results may occur.
*/
- tracing_reset_all_online_cpus();
+ tracing_reset_all_online_cpus_unlocked();
}
static int trace_module_notify(struct notifier_block *self,
--
2.25.1
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
commit e18eb8783ec4949adebc7d7b0fdb65f65bfeefd9 upstream.
Currently the tracing_reset_all_online_cpus() requires the
trace_types_lock held. But only one caller of this function actually has
that lock held before calling it, and the other just takes the lock so
that it can call it. More users of this function is needed where the lock
is not held.
Add a tracing_reset_all_online_cpus_unlocked() function for the one use
case that calls it without being held, and also add a lockdep_assert to
make sure it is held when called.
Then have tracing_reset_all_online_cpus() take the lock internally, such
that callers do not need to worry about taking it.
Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
[Refers to commit message of be111ebd8868d4b7c041cb3c6102e1ae27d6dc1d,
this patch is pre-depended, and tracing_reset_all_online_cpus() should
be called after trace_types_lock is held as its comment describes.]
Fixes: be111ebd8868 ("tracing: Free buffers when a used dynamic event is removed")
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
---
kernel/trace/trace.c | 11 ++++++++++-
kernel/trace/trace.h | 1 +
kernel/trace/trace_events.c | 2 +-
kernel/trace/trace_events_synth.c | 2 --
4 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 482ec6606b7b..70526400e05c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2178,10 +2178,12 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
}
/* Must have trace_types_lock held */
-void tracing_reset_all_online_cpus(void)
+void tracing_reset_all_online_cpus_unlocked(void)
{
struct trace_array *tr;
+ lockdep_assert_held(&trace_types_lock);
+
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
if (!tr->clear_trace)
continue;
@@ -2193,6 +2195,13 @@ void tracing_reset_all_online_cpus(void)
}
}
+void tracing_reset_all_online_cpus(void)
+{
+ mutex_lock(&trace_types_lock);
+ tracing_reset_all_online_cpus_unlocked();
+ mutex_unlock(&trace_types_lock);
+}
+
/*
* The tgid_map array maps from pid to tgid; i.e. the value stored at index i
* is the tgid last observed corresponding to pid=i.
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 37f616bf5fa9..e5b505b5b7d0 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -725,6 +725,7 @@ int tracing_is_enabled(void);
void tracing_reset_online_cpus(struct array_buffer *buf);
void tracing_reset_current(int cpu);
void tracing_reset_all_online_cpus(void);
+void tracing_reset_all_online_cpus_unlocked(void);
int tracing_open_generic(struct inode *inode, struct file *filp);
int tracing_open_generic_tr(struct inode *inode, struct file *filp);
bool tracing_is_disabled(void);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index bac13f24a96e..f8ed66f38175 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2661,7 +2661,7 @@ static void trace_module_remove_events(struct module *mod)
* over from this module may be passed to the new module events and
* unexpected results may occur.
*/
- tracing_reset_all_online_cpus();
+ tracing_reset_all_online_cpus_unlocked();
}
static int trace_module_notify(struct notifier_block *self,
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 18291ab35657..ee174de0b8f6 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -1363,7 +1363,6 @@ int synth_event_delete(const char *event_name)
mutex_unlock(&event_mutex);
if (mod) {
- mutex_lock(&trace_types_lock);
/*
* It is safest to reset the ring buffer if the module
* being unloaded registered any events that were
@@ -1375,7 +1374,6 @@ int synth_event_delete(const char *event_name)
* occur.
*/
tracing_reset_all_online_cpus();
- mutex_unlock(&trace_types_lock);
}
return ret;
--
2.25.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 306320034e8fbe7ee1cc4f5269c55658b4612048
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023061831-detention-overtime-783b@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 306320034e8fbe7ee1cc4f5269c55658b4612048 Mon Sep 17 00:00:00 2001
From: Bernhard Seibold <mail(a)bernhard-seibold.de>
Date: Fri, 2 Jun 2023 15:30:29 +0200
Subject: [PATCH] serial: lantiq: add missing interrupt ack
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, the error interrupt is never acknowledged, so once active it
will stay active indefinitely, causing the handler to be called in an
infinite loop.
Fixes: 2f0fc4159a6a ("SERIAL: Lantiq: Add driver for MIPS Lantiq SOCs.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Bernhard Seibold <mail(a)bernhard-seibold.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Message-ID: <20230602133029.546-1-mail(a)bernhard-seibold.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/lantiq.c b/drivers/tty/serial/lantiq.c
index a58e9277dfad..f1387f1024db 100644
--- a/drivers/tty/serial/lantiq.c
+++ b/drivers/tty/serial/lantiq.c
@@ -250,6 +250,7 @@ lqasc_err_int(int irq, void *_port)
struct ltq_uart_port *ltq_port = to_ltq_uart_port(port);
spin_lock_irqsave(<q_port->lock, flags);
+ __raw_writel(ASC_IRNCR_EIR, port->membase + LTQ_ASC_IRNCR);
/* clear any pending interrupts */
asc_update_bits(0, ASCWHBSTATE_CLRPE | ASCWHBSTATE_CLRFE |
ASCWHBSTATE_CLRROE, port->membase + LTQ_ASC_WHBSTATE);
The quilt patch titled
Subject: nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
has been removed from the -mm tree. Its filename was
nilfs2-prevent-general-protection-fault-in-nilfs_clear_dirty_page.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
Date: Mon, 12 Jun 2023 11:14:56 +0900
In a syzbot stress test that deliberately causes file system errors on
nilfs2 with a corrupted disk image, it has been reported that
nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
cache and calling nilfs_clear_dirty_page() for each dirty page/folio
retrieved, the back reference from the argument page to "mapping" may have
been changed to NULL (and possibly others). It is necessary to check this
after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
after locking it in nilfs_clear_dirty_pages() if the back reference
"mapping" from the page/folio is different from the "mapping" that held
the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+53369d11851d8f26735c(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/page.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/page.c~nilfs2-prevent-general-protection-fault-in-nilfs_clear_dirty_page
+++ a/fs/nilfs2/page.c
@@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct addr
struct folio *folio = fbatch.folios[i];
folio_lock(folio);
- nilfs_clear_dirty_page(&folio->page, silent);
+
+ /*
+ * This folio may have been removed from the address
+ * space by truncation or invalidation when the lock
+ * was acquired. Skip processing in that case.
+ */
+ if (likely(folio->mapping == mapping))
+ nilfs_clear_dirty_page(&folio->page, silent);
+
folio_unlock(folio);
}
folio_batch_release(&fbatch);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: nilfs2: fix buffer corruption due to concurrent device reads
has been removed from the -mm tree. Its filename was
nilfs2-fix-buffer-corruption-due-to-concurrent-device-reads.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix buffer corruption due to concurrent device reads
Date: Fri, 9 Jun 2023 12:57:32 +0900
As a result of analysis of a syzbot report, it turned out that in three
cases where nilfs2 allocates block device buffers directly via sb_getblk,
concurrent reads to the device can corrupt the allocated buffers.
Nilfs2 uses sb_getblk for segment summary blocks, that make up a log
header, and the super root block, that is the trailer, and when moving and
writing the second super block after fs resize.
In any of these, since the uptodate flag is not set when storing metadata
to be written in the allocated buffers, the stored metadata will be
overwritten if a device read of the same block occurs concurrently before
the write. This causes metadata corruption and misbehavior in the log
write itself, causing warnings in nilfs_btree_assign() as reported.
Fix these issues by setting an uptodate flag on the buffer head on the
first or before modifying each buffer obtained with sb_getblk, and
clearing the flag on failure.
When setting the uptodate flag, the lock_buffer/unlock_buffer pair is used
to perform necessary exclusive control, and the buffer is filled to ensure
that uninitialized bytes are not mixed into the data read from others. As
for buffers for segment summary blocks, they are filled incrementally, so
if the uptodate flag was unset on their allocation, set the flag and zero
fill the buffer once at that point.
Also, regarding the superblock move routine, the starting point of the
memset call to zerofill the block is incorrectly specified, which can
cause a buffer overflow on file systems with block sizes greater than
4KiB. In addition, if the superblock is moved within a large block, it is
necessary to assume the possibility that the data in the superblock will
be destroyed by zero-filling before copying. So fix these potential
issues as well.
Link: https://lkml.kernel.org/r/20230609035732.20426-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+31837fe952932efc8fb9(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000030000a05e981f475@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segbuf.c | 6 ++++++
fs/nilfs2/segment.c | 7 +++++++
fs/nilfs2/super.c | 23 ++++++++++++++++++++++-
3 files changed, 35 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/segbuf.c~nilfs2-fix-buffer-corruption-due-to-concurrent-device-reads
+++ a/fs/nilfs2/segbuf.c
@@ -101,6 +101,12 @@ int nilfs_segbuf_extend_segsum(struct ni
if (unlikely(!bh))
return -ENOMEM;
+ lock_buffer(bh);
+ if (!buffer_uptodate(bh)) {
+ memset(bh->b_data, 0, bh->b_size);
+ set_buffer_uptodate(bh);
+ }
+ unlock_buffer(bh);
nilfs_segbuf_add_segsum_buffer(segbuf, bh);
return 0;
}
--- a/fs/nilfs2/segment.c~nilfs2-fix-buffer-corruption-due-to-concurrent-device-reads
+++ a/fs/nilfs2/segment.c
@@ -981,10 +981,13 @@ static void nilfs_segctor_fill_in_super_
unsigned int isz, srsz;
bh_sr = NILFS_LAST_SEGBUF(&sci->sc_segbufs)->sb_super_root;
+
+ lock_buffer(bh_sr);
raw_sr = (struct nilfs_super_root *)bh_sr->b_data;
isz = nilfs->ns_inode_size;
srsz = NILFS_SR_BYTES(isz);
+ raw_sr->sr_sum = 0; /* Ensure initialization within this update */
raw_sr->sr_bytes = cpu_to_le16(srsz);
raw_sr->sr_nongc_ctime
= cpu_to_le64(nilfs_doing_gc() ?
@@ -998,6 +1001,8 @@ static void nilfs_segctor_fill_in_super_
nilfs_write_inode_common(nilfs->ns_sufile, (void *)raw_sr +
NILFS_SR_SUFILE_OFFSET(isz), 1);
memset((void *)raw_sr + srsz, 0, nilfs->ns_blocksize - srsz);
+ set_buffer_uptodate(bh_sr);
+ unlock_buffer(bh_sr);
}
static void nilfs_redirty_inodes(struct list_head *head)
@@ -1780,6 +1785,7 @@ static void nilfs_abort_logs(struct list
list_for_each_entry(segbuf, logs, sb_list) {
list_for_each_entry(bh, &segbuf->sb_segsum_buffers,
b_assoc_buffers) {
+ clear_buffer_uptodate(bh);
if (bh->b_page != bd_page) {
if (bd_page)
end_page_writeback(bd_page);
@@ -1791,6 +1797,7 @@ static void nilfs_abort_logs(struct list
b_assoc_buffers) {
clear_buffer_async_write(bh);
if (bh == segbuf->sb_super_root) {
+ clear_buffer_uptodate(bh);
if (bh->b_page != bd_page) {
end_page_writeback(bd_page);
bd_page = bh->b_page;
--- a/fs/nilfs2/super.c~nilfs2-fix-buffer-corruption-due-to-concurrent-device-reads
+++ a/fs/nilfs2/super.c
@@ -372,10 +372,31 @@ static int nilfs_move_2nd_super(struct s
goto out;
}
nsbp = (void *)nsbh->b_data + offset;
- memset(nsbp, 0, nilfs->ns_blocksize);
+ lock_buffer(nsbh);
if (sb2i >= 0) {
+ /*
+ * The position of the second superblock only changes by 4KiB,
+ * which is larger than the maximum superblock data size
+ * (= 1KiB), so there is no need to use memmove() to allow
+ * overlap between source and destination.
+ */
memcpy(nsbp, nilfs->ns_sbp[sb2i], nilfs->ns_sbsize);
+
+ /*
+ * Zero fill after copy to avoid overwriting in case of move
+ * within the same block.
+ */
+ memset(nsbh->b_data, 0, offset);
+ memset((void *)nsbp + nilfs->ns_sbsize, 0,
+ nsbh->b_size - offset - nilfs->ns_sbsize);
+ } else {
+ memset(nsbh->b_data, 0, nsbh->b_size);
+ }
+ set_buffer_uptodate(nsbh);
+ unlock_buffer(nsbh);
+
+ if (sb2i >= 0) {
brelse(nilfs->ns_sbh[sb2i]);
nilfs->ns_sbh[sb2i] = nsbh;
nilfs->ns_sbp[sb2i] = nsbp;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: scripts: fix the gfp flags header path in gfp-translate
has been removed from the -mm tree. Its filename was
scripts-fix-the-gfp-flags-header-path-in-gfp-translate.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Prathu Baronia <prathubaronia2011(a)gmail.com>
Subject: scripts: fix the gfp flags header path in gfp-translate
Date: Thu, 8 Jun 2023 21:14:49 +0530
Since gfp flags have been shifted to gfp_types.h so update the path in
the gfp-translate script.
Link: https://lkml.kernel.org/r/20230608154450.21758-1-prathubaronia2011@gmail.com
Fixes: cb5a065b4ea9c ("headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>")
Signed-off-by: Prathu Baronia <prathubaronia2011(a)gmail.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Nicolas Schier <nicolas(a)fjasle.eu>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Yury Norov <yury.norov(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gfp-translate | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/scripts/gfp-translate~scripts-fix-the-gfp-flags-header-path-in-gfp-translate
+++ a/scripts/gfp-translate
@@ -63,11 +63,11 @@ fi
# Extract GFP flags from the kernel source
TMPFILE=`mktemp -t gfptranslate-XXXXXX` || exit 1
-grep -q ___GFP $SOURCE/include/linux/gfp.h
+grep -q ___GFP $SOURCE/include/linux/gfp_types.h
if [ $? -eq 0 ]; then
- grep "^#define ___GFP" $SOURCE/include/linux/gfp.h | sed -e 's/u$//' | grep -v GFP_BITS > $TMPFILE
+ grep "^#define ___GFP" $SOURCE/include/linux/gfp_types.h | sed -e 's/u$//' | grep -v GFP_BITS > $TMPFILE
else
- grep "^#define __GFP" $SOURCE/include/linux/gfp.h | sed -e 's/(__force gfp_t)//' | sed -e 's/u)/)/' | grep -v GFP_BITS | sed -e 's/)\//) \//' > $TMPFILE
+ grep "^#define __GFP" $SOURCE/include/linux/gfp_types.h | sed -e 's/(__force gfp_t)//' | sed -e 's/u)/)/' | grep -v GFP_BITS | sed -e 's/)\//) \//' > $TMPFILE
fi
# Parse the flags
_
Patches currently in -mm which might be from prathubaronia2011(a)gmail.com are
The quilt patch titled
Subject: udmabuf: revert 'Add support for mapping hugepages (v4)'
has been removed from the -mm tree. Its filename was
udmabuf-revert-add-support-for-mapping-hugepages-v4.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: udmabuf: revert 'Add support for mapping hugepages (v4)'
Date: Thu, 8 Jun 2023 13:49:27 -0700
This effectively reverts commit 16c243e99d33 ("udmabuf: Add support for
mapping hugepages (v4)"). Recently, Junxiao Chang found a BUG with page
map counting as described here [1]. This issue pointed out that the
udmabuf driver was making direct use of subpages of hugetlb pages. This
is not a good idea, and no other mm code attempts such use. In addition
to the mapcount issue, this also causes issues with hugetlb vmemmap
optimization and page poisoning.
For now, remove hugetlb support.
If udmabuf wants to be used on hugetlb mappings, it should be changed to
only use complete hugetlb pages. This will require different alignment
and size requirements on the UDMABUF_CREATE API.
[1] https://lore.kernel.org/linux-mm/20230512072036.1027784-1-junxiao.chang@int…
Link: https://lkml.kernel.org/r/20230608204927.88711-1-mike.kravetz@oracle.com
Fixes: 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Acked-by: Gerd Hoffmann <kraxel(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Dongwon Kim <dongwon.kim(a)intel.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Jerome Marchand <jmarchan(a)redhat.com>
Cc: Junxiao Chang <junxiao.chang(a)intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/dma-buf/udmabuf.c | 47 ++++--------------------------------
1 file changed, 6 insertions(+), 41 deletions(-)
--- a/drivers/dma-buf/udmabuf.c~udmabuf-revert-add-support-for-mapping-hugepages-v4
+++ a/drivers/dma-buf/udmabuf.c
@@ -12,7 +12,6 @@
#include <linux/shmem_fs.h>
#include <linux/slab.h>
#include <linux/udmabuf.h>
-#include <linux/hugetlb.h>
#include <linux/vmalloc.h>
#include <linux/iosys-map.h>
@@ -207,9 +206,7 @@ static long udmabuf_create(struct miscde
struct udmabuf *ubuf;
struct dma_buf *buf;
pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit;
- struct page *page, *hpage = NULL;
- pgoff_t subpgoff, maxsubpgs;
- struct hstate *hpstate;
+ struct page *page;
int seals, ret = -EINVAL;
u32 i, flags;
@@ -245,7 +242,7 @@ static long udmabuf_create(struct miscde
if (!memfd)
goto err;
mapping = memfd->f_mapping;
- if (!shmem_mapping(mapping) && !is_file_hugepages(memfd))
+ if (!shmem_mapping(mapping))
goto err;
seals = memfd_fcntl(memfd, F_GET_SEALS, 0);
if (seals == -EINVAL)
@@ -256,48 +253,16 @@ static long udmabuf_create(struct miscde
goto err;
pgoff = list[i].offset >> PAGE_SHIFT;
pgcnt = list[i].size >> PAGE_SHIFT;
- if (is_file_hugepages(memfd)) {
- hpstate = hstate_file(memfd);
- pgoff = list[i].offset >> huge_page_shift(hpstate);
- subpgoff = (list[i].offset &
- ~huge_page_mask(hpstate)) >> PAGE_SHIFT;
- maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT;
- }
for (pgidx = 0; pgidx < pgcnt; pgidx++) {
- if (is_file_hugepages(memfd)) {
- if (!hpage) {
- hpage = find_get_page_flags(mapping, pgoff,
- FGP_ACCESSED);
- if (!hpage) {
- ret = -EINVAL;
- goto err;
- }
- }
- page = hpage + subpgoff;
- get_page(page);
- subpgoff++;
- if (subpgoff == maxsubpgs) {
- put_page(hpage);
- hpage = NULL;
- subpgoff = 0;
- pgoff++;
- }
- } else {
- page = shmem_read_mapping_page(mapping,
- pgoff + pgidx);
- if (IS_ERR(page)) {
- ret = PTR_ERR(page);
- goto err;
- }
+ page = shmem_read_mapping_page(mapping, pgoff + pgidx);
+ if (IS_ERR(page)) {
+ ret = PTR_ERR(page);
+ goto err;
}
ubuf->pages[pgbuf++] = page;
}
fput(memfd);
memfd = NULL;
- if (hpage) {
- put_page(hpage);
- hpage = NULL;
- }
}
exp_info.ops = &udmabuf_ops;
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
The quilt patch titled
Subject: memfd: check for non-NULL file_seals in memfd_create() syscall
has been removed from the -mm tree. Its filename was
memfd-check-for-non-null-file_seals-in-memfd_create-syscall.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Subject: memfd: check for non-NULL file_seals in memfd_create() syscall
Date: Wed, 7 Jun 2023 15:24:27 +0200
Ensure that file_seals is non-NULL before using it in the memfd_create()
syscall. One situation in which memfd_file_seals_ptr() could return a
NULL pointer when CONFIG_SHMEM=n, oopsing the kernel.
Link: https://lkml.kernel.org/r/20230607132427.2867435-1-roberto.sassu@huaweiclou…
Fixes: 47b9012ecdc7 ("shmem: add sealing support to hugetlb-backed memfd")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: Marc-Andr Lureau <marcandre.lureau(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memfd.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/mm/memfd.c~memfd-check-for-non-null-file_seals-in-memfd_create-syscall
+++ a/mm/memfd.c
@@ -371,12 +371,15 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111;
file_seals = memfd_file_seals_ptr(file);
- *file_seals &= ~F_SEAL_SEAL;
- *file_seals |= F_SEAL_EXEC;
+ if (file_seals) {
+ *file_seals &= ~F_SEAL_SEAL;
+ *file_seals |= F_SEAL_EXEC;
+ }
} else if (flags & MFD_ALLOW_SEALING) {
/* MFD_EXEC and MFD_ALLOW_SEALING are set */
file_seals = memfd_file_seals_ptr(file);
- *file_seals &= ~F_SEAL_SEAL;
+ if (file_seals)
+ *file_seals &= ~F_SEAL_SEAL;
}
fd_install(fd, file);
_
Patches currently in -mm which might be from roberto.sassu(a)huawei.com are
shmem-use-ramfs_kill_sb-for-kill_sb-method-of-ramfs-based-tmpfs.patch
The quilt patch titled
Subject: mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
has been removed from the -mm tree. Its filename was
mm-vmalloc-do-not-output-a-spurious-warning-when-huge-vmalloc-fails.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lorenzo Stoakes <lstoakes(a)gmail.com>
Subject: mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
Date: Mon, 5 Jun 2023 21:11:07 +0100
In __vmalloc_area_node() we always warn_alloc() when an allocation
performed by vm_area_alloc_pages() fails unless it was due to a pending
fatal signal.
However, huge page allocations instigated either by vmalloc_huge() or
__vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
kvmalloc_node()) always falls back to order-0 allocations if the huge page
allocation fails.
This renders the warning useless and noisy, especially as all callers
appear to be aware that this may fallback. This has already resulted in
at least one bug report from a user who was confused by this (see link).
Therefore, simply update the code to only output this warning for order-0
pages when no fatal signal is pending.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
Link: https://lkml.kernel.org/r/20230605201107.83298-1-lstoakes@gmail.com
Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()")
Signed-off-by: Lorenzo Stoakes <lstoakes(a)gmail.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-do-not-output-a-spurious-warning-when-huge-vmalloc-fails
+++ a/mm/vmalloc.c
@@ -3098,11 +3098,20 @@ static void *__vmalloc_area_node(struct
* allocation request, free them via vfree() if any.
*/
if (area->nr_pages != nr_small_pages) {
- /* vm_area_alloc_pages() can also fail due to a fatal signal */
- if (!fatal_signal_pending(current))
+ /*
+ * vm_area_alloc_pages() can fail due to insufficient memory but
+ * also:-
+ *
+ * - a pending fatal signal
+ * - insufficient huge page-order pages
+ *
+ * Since we always retry allocations at order-0 in the huge page
+ * case a warning for either is spurious.
+ */
+ if (!fatal_signal_pending(current) && page_order == 0)
warn_alloc(gfp_mask, NULL,
- "vmalloc error: size %lu, page order %u, failed to allocate pages",
- area->nr_pages * PAGE_SIZE, page_order);
+ "vmalloc error: size %lu, failed to allocate pages",
+ area->nr_pages * PAGE_SIZE);
goto fail;
}
_
Patches currently in -mm which might be from lstoakes(a)gmail.com are